Proctored exams have fallen to generative AI

A Turkish university candidate was recently arrested after being caught using an AI-powered system to obtain answers to the entrance exam in real-time.

Source: Student Caught Using Artificial Intelligence to Cheat on University Entrance Test Students wired up to a computer while taking their exams

A couple of years ago (and a few times since) I observed that proctored exams offer no meaningful defence against generative AI so I am a little surprised it has taken so long for someone to be caught doing this. I guess that others have been more careful.

The candidate used a simple and rather obvious set-up: a camera disguised as a shirt button that was used to read the questions, a router hidden in a hollowed-out shoe linking to a stealthily concealed mobile device that queried a generative AI (likely ChatGPT-powered) that fed the answers back verbally to an in-ear bluetooth earpiece. Constructing such a thing would take a little ingenuity but it’s not rocket science. It’s not even computer science. Anyone could do this. It would take some skill to make it work well, though, and that may be the reason this attempt went wrong. The candidate was caught as a result of their suspicious behaviour, not because anyone directly noticed the tech. I’m trying to imagine the interface, how the machine would know which question to answer (did the candidate have to point their button in the right direction?), how they dealt with dictating the answers at a usable speed (what if they needed it to be repeated? Did they have to tap a microphone a number of times?), how they managed sequence and pacing (sub-vocalization? moving in a particular way?). These are soluble problems but they are not trivial, and skill would be needed to make the whole thing seem natural.

It may take a little while for this to become a widespread commodity item (and a bit longer for exam-takers to learn to use it unobtrusively), but I’m prepared to bet that someone is working on it, if it is not already available. And, yes, exam-setters will come up with a counter-technology to address this particular threat (scanners? signal blockers? Forcing students to strip naked?) but the cheats will be more ingenious, the tech will improve, and so it will go on, in an endless and unwinnable arms race.

Very few people cheat as a matter of course. This candidate was arrested – exam cheating is against the law in Turkey – for attempting to solve the problem they were required to solve, which was to pass the test, not to demonstrate their competence. The level of desperation that led to them adopting such a risky solution to the problem is hard to imagine, but it’s easy to understand how high the stakes must have seemed and how strong the incentive to succeed must have been. The fact that, in most societies, we habitually inflict such tests on both children and adults, on an unimaginably vast scale, will hopefully one day be seen as barbaric, on a par with beating children to make them behave. They are inauthentic, inaccurate, inequitable and, most absurdly of all, a primary cause of the problem they are designed to solve. We really do need to find a better solution.

Note on the post title: the student was caught so, as some have pointed out,  it would be an exaggeration to say that this one case is proof that proctored exams have fallen to generative AI, but I think it is a very safe assumption that this is not a lone example. This is a landmark case because it provides the first direct evidence that this is happening in the wild, not because it is the first time it has ever happened.

Our educational assessment systems are designed to create losers

The always wonderful Alfie Kohn describes an airline survey that sought to find out how it compared with others, which he chose not to answer because the airline was thus signalling no interest in providing the best quality experience possible, just aiming to do enough to beat the competition. The thrust of his article is that much the same is true of standardized tests in schools. As Kohn rightly observes, the central purpose of testing as it tends to be used in schools and beyond is not to evaluate successful learning but to compare students (or teachers, or institutions, or regions) with one another in order to identify winners and losers.

‘When you think about it, all standardized tests — not just those that are norm-referenced — are based on this compulsion to compare. If we were interested in educational excellence, we could use authentic forms of assessment that are based on students’ performance at a variety of classroom projects over time. The only reason to standardize the process, to give all kids the same questions under the same conditions on a contrived, one-shot, high-stakes test, is if what we wanted to know wasn’t “How well are they learning?” but “Who’s beating whom?”

It’s a good point, but I think it is not just an issue with standardized tests. The problem occurs with all the summative assessments (the judgments) we use. Our educational assessment systems are designed to create losers as much as they a made to find winners. Whether they follow the heinous practice of norm-referencing or not, they are sorting machines, built to discover competent people, and to discard the incompetent. In fact, as Kohn notes, when there are too many winners we are accused of grade inflation or a dropping of standards.

Wrong Way sign This makes no sense if you believe, as I do, that the purpose of education is to educate. In a system that demands grading, unless 100% of students that want to succeed get the best possible grades, then we have failed to meet the grade ourselves. The problem, though, is not so much the judgments themselves as it is the intimate, inextricable binding of judgmental with learning processes. Given enough time, effort, and effective teaching, almost anyone can achieve pretty much any skill or competence, as long as they stick at it. We have very deliberately built a system that does not aim for that at all. Instead, it aims to sort wheat from chaff. That’s not why I do the job I do, and I hope it is not why you do it either, but that’s exactly what the system is made to do. And yet we (at least I) think of ourselves as educators, not judges. These two roles are utterly separate and inconsolably inconsistent.

Who needs 100%?

It might be argued that some students don’t actually want to get the best possible grades. True. And sure, we don’t always want or need to learn everything we could learn. If I am learning how to use a new device or musical instrument I sometimes read/watch enough to get me started and do not go any further, or skim through to get the general gist. Going for a less-than-perfect understanding is absolutely fine if that’s all you need right now. But that’s not quite how it works in formal education, in part because we punish those that make such choices (by giving lower grades) and in part because we systematically force students to learn stuff they neither want nor need to learn, at a time that we choose, using the lure of the big prizes at the end to coax them. Even those that actually do want or need to learn a topic must stick with it to the bitter end regardless of whether it is useful to do the whole thing, regardless of whether they need more or less of it, regardless of whether it is the right time to learn it, regardless of whether it is the right way for them to learn it. They must do all that we say they must do, or we won’t give them the gold star. That’s not even a good way to train a dog.

It gets worse. At least dogs normally get a second chance. Having set the bar, we normally give just a single chance at winning or, at best, an option to be re-tested (often at a price and usually only once), rather than doing the human thing of allowing people to take the time they need and learn from their mistakes until they get as good as they want or need to get. We could learn a thing or two from computer games –  the ability to repeat over and over, achieving small wins all along the way without huge penalties for losing, is a powerful way to gain competence and sustain motivation. It is better if students have some control over the pacing but, even at Athabasca, an aggressively open university that does its best to give everyone all the opportunity they need to succeed, where self-paced learners can choose the point at which they are ready to take the assessments, we still have strict cut-offs for contract periods and, like all the rest, we still tend to allow just a single stab at each assessment. In most of my own self-paced courses (and in some others) we try to soften that by allowing students to iterate without penalty until the end but, when that end comes, that’s still it. This is not for the benefit of the students: this is for our convenience. Yes, there is a cost to giving greater freedom – it takes time, effort, and compassion – but that’s a business problem to solve, not an insuperable barrier. WGU’s subscription model, for instance, in which students pay for an all-you-can-eat smorgasbord, appears to work pretty well.

Meta lessons

It might be argued that there are other important lessons that we teach when we competitively grade. Some might suggest that competition is a good thing to learn in and of itself, because it is one of the things that drives society and everyone has to do it at least sometimes. Sure, but cooperation and mutual support is usually better, or at least an essential counterpart, so embedding competition as the one and only modality seems a bit limiting. And, if we are serious about teaching people about how to compete, then that is what we should do, and not actively put them in jeopardy to achieve that: as Jerome Bruner succinctly put it, ‘Learning something with the aid of an instructor should, if instruction is effective, be less dangerous or risky or painful than learning on one’s own’ (Bruner 1966, p.44).

Others might claim that sticking with something you don’t like doing is a necessary lesson if people are to play a suitably humble/productive role in society. Such lessons have a place, I kind-of agree. Just not a central place, just not a pervasive place that underpins or, worse, displaces everything else. Yes, grit can be really useful, if you are pursuing your goals or helping others to reach theirs. By all means, let’s teach that, let’s nurture that, and by all means let’s do what we can to help students see how learning something we are teaching can help them to reach their goals, even though it might be difficult or unpleasant right now. But there’s a big difference between doing something for self or others, and subservient compliance with someone else’s demands. ‘Grit’ does not have to be synonymous with ‘taking orders’. Doing something distasteful because we feel we must, because it aligns with our sense of self-worth, because it will help those we care about, because it will lead us where we want to be, is all good. Doing something because someone else is making us do it (with the threat/reward of grades) might turn us into good soldiers, might generate a subservient workforce in a factory or coal face, might keep an unruly subjugated populace in check, but it’s not the kind of attitude that is going to be helpful if we want to nurture creative, caring, useful members of 21st Century society.

Societal roles

It might be argued that accreditation serves a powerful societal function, ranking and categorizing people in ways that (at least for the winners and for consumers of graduates) have some value. It’s a broken and heartless system, but our societies do tend to be organized around it and it would be quite disruptive if we got rid of it without finding some replacement. Without it, employers might actually need to look at evidence of what people have done, for instance, rather than speedily weeding out those with insufficient grades. Moreover, circularly enough, most of our students currently want and expect it because it’s how things are done in our culture. Even I, a critic of the system, proudly wear the label ‘Doctor’, because it confers status and signals particular kinds of achievement, and there is no doubt that it and other qualifications have been really quite useful in my career. If that were all accreditation did then I could quite happily live with it, even though the fact that I spent a few years researching something interesting about 15 years ago probably has relatively little bearing on what I do or can do now.  The problem is not accreditation in itself, but that it is inextricably bound to the learning process. Under such conditions, educational assessment systems are positively harmful to learning. They are anti-educative. Of necessity, due to the fact that they tend to determine precisely what students should do and how they should do it, they sap intrinsic motivation and undermine love of learning. Even the staunchest of defenders of tightly integrated learning and judgment would presumably accept that learning is at least as important as grading so, if grading undermines learning (and it quite unequivocally does), something is badly broken.

A simple solution?

It does not have to be this way. I’ve said it before but it bears repeating: at least a large part of the solution is to decouple learning and accreditation altogether. There is a need for some means to indicate prowess, sure. But the crude certificates we currently use may not be the best way to do that in all cases, and it doesn’t have to dominate the learning process to the point of killing love of learning. If we could drop the accreditation role during the teaching process we could focus much more on providing useful feedback, on valorizing failures as useful steps towards success, on making interesting diversions, on tailoring the learning experience to the learner’s interests and capabilities rather than to credential requirements, on providing learning experiences that are long enough and detailed enough for the students’ needs, rather than a uniform set of fixed lengths to suit our bureaucracies.

Equally, we could improve our ability to provide credentials. For those that need it, we could still offer plenty of accreditation opportunities, for example through a portfolio-based approach and/or collecting records of learning or badges along the way. We could even allow for some kind of testing like oral, written, or practical exams for those that must, where it is appropriate to the competence (not, as now, as a matter of course) and we could actually do it right, rather than in ways that positively enable and reward cheating. None of this has to bound to specific courses. This decoupling would also give students the freedom to choose other ways of learning apart from our own courses, which would be quite a strong incentive for us to concentrate on teaching well. It might challenge us to come up with authentic forms of assessment that allow students to demonstrate competence through practice, or to use evidence from multiple sources, or to show their particular and unique skillset. It would almost certainly let us do both accreditation and teaching better. And it’s not as though we have no models to work from: from driving tests to diving tests to uses of portfolios in job interviews, there are plenty of examples of ways this can work already.

Apart from some increased complexities of managing such a system (which is where online tools can come in handy and where opportunities exist for online institutions that conventional face-to-face institutions cannot compete with) this is not a million miles removed from what we do now: it doesn’t require a revolution, just a simple shift in emphasis, and a separation of two unnecessarily and mutually inconsistent intertwined roles. Especially when processes and tools already exist for that, as they do at Athabasca University, it would not even be particularly costly. Inertia would be a bigger problem than anything else, but even big ships can eventually be steered in other directions. We just have to choose to make it so.

 

Reference

Bruner, J. S. (1966). Toward a Theory of Instruction. Cambridge MA: The Belknap Press of Harvard University Press.