Proctored exams have fallen to generative AI

Posted June 18, 2024, updated onJune 20, 2024 by Jon Dron

A Turkish university candidate was recently arrested after being caught using an AI-powered system to obtain answers to the entrance exam in real-time.

Source: Student Caught Using Artificial Intelligence to Cheat on University Entrance Test

A couple of years ago (and a few times since) I observed that proctored exams offer no meaningful defence against generative AI so I am a little surprised it has taken so long for someone to be caught doing this. I guess that others have been more careful.

The candidate used a simple and rather obvious set-up: a camera disguised as a shirt button that was used to read the questions, a router hidden in a hollowed-out shoe linking to a stealthily concealed mobile device that queried a generative AI (likely ChatGPT-powered) that fed the answers back verbally to an in-ear bluetooth earpiece. Constructing such a thing would take a little ingenuity but it’s not rocket science. It’s not even computer science. Anyone could do this. It would take some skill to make it work well, though, and that may be the reason this attempt went wrong. The candidate was caught as a result of their suspicious behaviour, not because anyone directly noticed the tech. I’m trying to imagine the interface, how the machine would know which question to answer (did the candidate have to point their button in the right direction?), how they dealt with dictating the answers at a usable speed (what if they needed it to be repeated? Did they have to tap a microphone a number of times?), how they managed sequence and pacing (sub-vocalization? moving in a particular way?). These are soluble problems but they are not trivial, and skill would be needed to make the whole thing seem natural.

It may take a little while for this to become a widespread commodity item (and a bit longer for exam-takers to learn to use it unobtrusively), but I’m prepared to bet that someone is working on it, if it is not already available. And, yes, exam-setters will come up with a counter-technology to address this particular threat (scanners? signal blockers? Forcing students to strip naked?) but the cheats will be more ingenious, the tech will improve, and so it will go on, in an endless and unwinnable arms race.

Very few people cheat as a matter of course. This candidate was arrested – exam cheating is against the law in Turkey – for attempting to solve the problem they were required to solve, which was to pass the test, not to demonstrate their competence. The level of desperation that led to them adopting such a risky solution to the problem is hard to imagine, but it’s easy to understand how high the stakes must have seemed and how strong the incentive to succeed must have been. The fact that, in most societies, we habitually inflict such tests on both children and adults, on an unimaginably vast scale, will hopefully one day be seen as barbaric, on a par with beating children to make them behave. They are inauthentic, inaccurate, inequitable and, most absurdly of all, a primary cause of the problem they are designed to solve. We really do need to find a better solution.

Note on the post title: the student was caught so, as some have pointed out, it would be an exaggeration to say that this one case is proof that proctored exams have fallen to generative AI, but I think it is a very safe assumption that this is not a lone example. This is a landmark case because it provides the first direct evidence that this is happening in the wild, not because it is the first time it has ever happened.

Ernst & Young fined $100 million after employees cheated in exams

Posted June 28, 2022, updated onJuly 2, 2022 by Jon Dron

Not just any exams: ethics exams.

These are the very accountants who are supposed to catch cheats. I guess at least they’ll understand their clientele pretty well.

But how did this happen? There are clues in the article:

“Many of the employees interviewed during the federal investigation said they knew cheating was a violation of the company’s code of conduct but did it anyway because of work commitments or the fact that they couldn’t pass training exams after multiple tries.” (my emphasis).

I think there might have been a clue about their understanding of ethical behaviour in that fact alone, don’t you? But I don’t think it’s really their fault: at least, it’s completely predictable to anyone with even the slightest knowledge of how motivation works.

If passing the exam is, by design, much more important than actually being able to do what is being examined, then of course people will cheat. For those with too much else to do or too little interest to succeed, when the pressure is high and the stakes are higher, it’s a perfectly logical course of action. But, even for all the rest who don’t cheat, the main focus for them will be on passing the exam, not on gaining any genuine competence or interest in the subject. It’s not their fault: that’s how it is designed. In fact, the strong extrinsic motivation it embodies is pretty much guaranteed to (at best) persistently numb their intrinsic interest in ethics, if it doesn’t extinguish it altogether. Most will do enough to pass and no more, taking shortcuts wherever possible, and there’s a good chance they will forget most of it as soon as they have done so.

Just to put the cherry on the pie, and not unexpectedly, EY refer to the process by which their accountants are expected to learn about ethics as ‘training’ and it is mandatory. So you have a bunch of unwilling people who are already working like demons to meet company demands, to whom you are doing something normally reserved for dogs or AI models, and then you are forcing them to take high-stakes exams about it, on which their futures depend. It’s a perfect shit storm. I’d not trust a single one of their graduates, exam cheats or not, and the tragedy is that the people who were trying to force them to behave ethically were the ones directly responsible for their unethical behaviour.

There may be a lesson or two to be learned from this for academics, who tend to be the biggest exam fetishists around, and who seem to love to control what their students do.

Originally posted at: https://landing.athabascau.ca/bookmarks/view/14163409/ernst-young-fined-100-million-after-employees-cheated-in-exams

A modest proposal for improving exam invigilation

Posted July 27, 2021, updated onJuly 27, 2021 by Jon Dron

There has been a lot of negative reaction of late to virtual proctors of online exams. Perhaps students miss the cheery camaraderie of traditional proctored exams, sitting silently in a sweaty room with pen and paper, doing one of the highest stakes, highest stress tasks of their lives, with someone scrutinizing their every nervous tic whose adverse judgment may destroy their hopes and careers, for the benefit of an invisible examiner whose motives and wishes are unclear but whose approval they dearly seek. Lovely. Traditional. Reassuring. A ritual for us all to cherish. It’s enough to bring a tear to the eye.

But exams cost a huge amount of money to host and to invigilate. It is even worse when one of the outcomes might, for the student or the invigilator, be death or disability due to an inconvenient virus.

I have a better solution.

photo of a toy robot Instead of costly invigilators and invigilation centres, all we need to do is to send out small (returnable, postage-paid) robots to students’ homes. A little robot sitting on the student’s desk or kitchen table as they sit their written exam (on paper, of course – tradition matters), recording every blink, watching their fingers writing on the paper, with 360 degree panoramic camera and the ability to zoom in on anything suspicious or interesting. Perhaps it could include microphones, infrared and microwave sensors, and maybe sensors to monitor skin resistance, pulse, etc, in order to look for nefarious activities or to call the ambulance if the student seems to be having a heart attack or stroke due to the stress. It could be made to talk, too. Perhaps it could offer spoken advice on the process, and alerts about the time left at carefully selected intervals. Students could choose the voice. It would also allow students to sit exams wherever and whenever they please: we are all in favour of student choice. With a bit of ingenuity it could scan what the students have written or drawn, and send it back to an examiner. Or, with a bit more ingenuity and careful use of AI, it could mark the paper on the spot, saving yet more money. Everyone wins.

It would be important to be student-centric in its design. It could, for instance, be made to look like a cute little furry animal with googly eyes to put students more at ease. Maybe it could make soothing cooing noises like a tribble, or like a cat purring. Conversely, it could be made to scuttle ominously around the desk and to appear like a spider with venomous-looking fangs, making gentle hissing noises, to remind students of the much lamented presence of in-person invigilators. Indeed, maybe it could be made to look like a caricature of a professor. More advanced models could emit bad smells to replicate invigilator farts or secret smoking habits. It could be made small and mobile, so that students could take it with them if they needed a bathroom break, during which it might play soothing muzak to put the student at ease, while recording everything they do. It would have to be tough, waterproof, and sterilizable, in order to cope with the odd frustrated student throwing or dunking it.

Perhaps it could offer stern spoken warnings if anomalies or abuses are found, and maybe connect itself to a human invigilator (I hear that they are cheaper in developing nations) who could control it and watch more closely. Perhaps it could be equipped with non-lethal weaponry to punish inappropriate behaviour if the warnings fail, and/or register students on an offenders database. It could be built to self-destruct if tampered with.

Though this is clearly something every university, school, and college would want, and the long-term savings would be immense, such technologies don’t come cheap. Quite apart from the hardware and software development costs, there would be a need for oodles of bandwidth and storage of the masses of data the robot would generate.

I have a solution to that, too: commercial sponsorship.

We could partner with, say, Amazon, who would be keen to mine useful information about the students’ surroundings and needs identified using the robot’s many sensors. A worn curtain? Stubborn stains? A shirt revealing personal interests? Send them to Amazon! Maybe Alexa could provide the voice for interactions and offer shopping advice when students stop to sharpen their pencils (need a better pencil? We have that in stock and can deliver it today!). And, of course, AWS would provide much of the infrastructure needed to support it, at fair educational prices. I expect early adopters would be described as ‘partners’ and offered slightly better (though still profitable) deals.

And there might be other things that could be done with the content. Perhaps the written answers could be analyzed to identify potential Amazon staffers. Maybe students expressing extremist views could be reported to the appropriate government agency, or at least added to a watch-list for the institution’s own use.

Naysayers might worry about hackers breaking into it or subverting its transmissions, or the data being sent to a country with laughable privacy laws, or the robot breaking down at a critical moment, or errors in handwriting recognition, but I’m sure that could be dealt with, the same as we deal with every other privacy, security, and reliability issue in IT in education. No problem. No sir. We have lawyers.

The details still need to be ironed out here and there, but the opportunities are endless. What could possibly go wrong? I think we should take this seriously. Seriously.

Evaluating assessment

Posted August 24, 2020, updated onAugust 27, 2020 by Jon Dron

A group of us at AU have begun discussions about how we might transform our assessment practices, in the light of the far-reaching AU Imagine plan and principles. This is a rare and exciting opportunity to bring about radical and positive change in how learning happens at the institution. Hard technologies influence soft more than vice versa, and assessments (particularly when tied to credentials) tend to be among the hardest of all technologies in any pedagogical intervention. They are therefore a powerful lever for change. Equally, and for the same reasons, they are too often the large, slow, structural elements that infest systems to stunt progress and innovation.

Almost all learning must involve assessment, whether it be of one’s own learning, or provided by other people or machines. Even babies constantly assess their own learning. Reflection is assessment. It is completely natural and it only gets weird when we treat it as a summative judgment, especially when we add grades or credentials to the process, thus normally changing the purpose of learning from achieving competence to achieving a reward. At best it distorts learning, making it seem like a chore rather than a delight, at worst it destroys it, even (and perhaps especially) when learners successfully comply with the demands of assessors and get a good grade. Unfortunately, that’s how most educational systems are structured, so the big challenge to all teachers must be to eliminate or at least to massively reduce this deeply pernicious effect. A large number of the pedagogies that we most value are designed to solve problems that are directly caused by credentials. These pedagogies include assessment practices themselves.

With that in mind, before the group’s first meeting I compiled a list of some of the main principles that I adhere to when designing assessments, most of which are designed to reduce or eliminate the structural failings of educational systems. The meeting caused me to reflect a bit more. This is the result:

Principles applying to all assessments

The primary purpose of assessment is to help the learner to improve their learning. All assessment should be formative.
Assessment without feedback (teacher, peer, machine, self) is ~~judgement, not assessment~~, pointless.
Ideally, feedback should be direct and immediate or, at least, as prompt as possible.
Feedback should only ever relate to what has been done, never the doer.
No criticism should ever be made without also at least outlining steps that might be taken to improve on it.
Grades (with some very rare minor exceptions where the grade is intrinsic to the activity, such as some gaming scenarios or, arguably, objective single-answer quizzes with T/F answers) are not feedback.
Assessment should never ever be used to reward or punish particular prior learning behaviours (e.g. use of exams to encourage revision, grades as goals, marks for participation, etc) .
Students should be able to choose how, when and on what they are assessed.
Where possible, students should participate in the assessment of themselves and others.
Assessment should help the teacher to understand the needs, interests, skills, and gaps in knowledge of their students, and should be used to help to improve teaching.
Assessment is a way to show learners that we care about their learning.

Specific principles for summative assessments

A secondary (and always secondary) purpose of assessment is to provide evidence for credentials. This is normally described as summative assessment, implying that it assesses a state of accomplishment when learning has ended. That is a completely ridiculous idea. Learning doesn’t end. Human learning is not in any meaningful way like programming a computer or storing stuff in a database. Knowledge and skills are active, ever-transforming, forever actively renewed, reframed, modified, and extended. They are things we do, not things we have.

With that in mind, here are my principles for assessment for credentials (none of which supersede or override any of the above core principles for assessment, which always apply):

There should be no assessment task that is not in itself a positive learning activity. Anything else is at best inefficient, at worst punitive/extrinsically rewarding.
Assessment for credentials must be fairly applied to all students.
Credentials should never be based on comparisons between students (norm-referenced assessment is always, unequivocally, and unredeemably wrong).
The criteria for achieving a credential should be clear to the learner and other interested parties (such as employers or other institutions), ideally before it happens, though this should not forestall the achievement and consideration of other valuable outcomes.
There is no such thing as failure, only unfinished learning. Credentials should only celebrate success, not punish current inability to succeed.
Students should be able to choose when they are ready to be assessed, and should be able to keep trying until they succeed.
Credentials should be based on evidence of competence and nothing else.
It should be impossible to compromise an assessment by revealing either the assessment or solutions to it.
There should be at least two ways to demonstrate competence, ideally more. Students should only have to prove it once (though may do so in many ways and many times, if they wish).
More than one person should be involved in judging competence (at least as an option, and/or on a regularly taken sample).
Students should have at least some say in how, when, and where they are assessed.
Where possible (accepting potential issues with professional accreditation, credit transfer, etc) they should have some say over the competencies that are assessed, in weighting and/or outcome.
Grades and marks should be avoided except where mandated elsewhere. Even then, all passes should be treated as an ‘A’ because students should be able to keep trying until they excel.
Great success may sometimes be worthy of an award – e.g. a distinction – but such an award should never be treated as a reward.
Assessment for credentials should demonstrate the ability to apply learning in an authentic context. There may be many such contexts.
Ideally, assessment for credentials should be decoupled from the main teaching process, because of risks of bias, the potential issues of teaching to the test (regardless of individual needs, interests and capabilities) and the dangers to motivation of the assessment crowding out the learning. However, these risks are much lower if all the above principles are taken on board.

I have most likely missed a few important issues, and there is a bit of redundancy in all this, but this is a work in progress. I think it covers the main points.

Further random reflections

There are some overriding principles and implied specifics in all of this. For instance, respect for diversity, accessibility, respect for individuals, and recognition of student control all fall out of or underpin these principles. It implies that we should recognize success, even when it is not the success we expected, so outcome harvesting makes far more sense than measurement of planned outcomes. It implies that failure should only ever be seen as unfinished learning, not as a summative judgment of terminal competence, so appreciative inquiry is far better than negative critique. It implies flexibility in all aspects of the activity. It implies, above and beyond any other purpose, that the focus should always be on learning. If assessment for credentials adversely affects learning then it should be changed at once.

In terms of implementation, while objective quizzes and their cousins can play a useful formative role in helping students to self-assess and to build confidence, machines (whether implemented by computers or rule-following humans) should normally be kept out of credentialling. There’s a place for AI but only when it augments and informs human intelligence, never when it behaves autonomously. Written exams and their ilk should be avoided, unless they conform to or do not conflict with all the above principles: I have found very few examples like this in the real world, though some practical demonstrations of competence in an authentic setting (e.g. lab work and reporting) and some reflective exercises on prior work can be effective.

A portfolio of evidence, including a reflective commentary, is usually going to be the backbone of any fair, humane, effective assessment: something that lets students highlight successes (whether planned or not), that helps them to consolidate what they have learned, and that is flexible enough to demonstrate competence shown in any number of ways. Outputs or observations of authentic activities are going to be important contributors to that. My personal preference in summative assessments is to only use the intended (including student-generated) and/or harvested outcomes for judging success, not for mandated assignments. This gives flexibility, it works for every subject, and it provides unquivocal and precise evidence of success. It’s also often good to talk with students, perhaps formally (e.g. a presentation or oral exam), in order to tease out what they really know and to give instant feedback. It is worth noting that, unlike written exams and their ilk, such methods are actually fun for all concerned, albeit that the pleasure comes from solving problems and overcoming challenges, so it is seldom easy.

Interestingly, there are occasions in traditional academia where these principles are, for the most part, already widely applied. A typical doctoral thesis/dissertation, for example, is often quite close to it (especially in more modern professional forms that put more emphasis on recording the process), as are some student projects. We know that such things are a really good idea, and lead to far richer, more persistent, more fulfilling learning for everyone. We do not do them ubiquitously for reasons of cost and time. It does take a long time to assess something like this well, and it can take more time during the rest of the teaching process thanks to the personalization (real personalization, not the teacher-imposed form popularized by learning analytics aficionados) and extra care that it implies. It is an efficient use of our time, though, because of its active contribution to learning, unlike a great many traditional assessment methods like teacher-set assignments (minimal contribution) and exams (negative contribution). A lot of the reason for our reticence, though, is the typical university’s schedule and class timetabling, which makes everything pile on at once in an intolerable avalanche of submissions. If we really take autonomy and flexibility on board, it doesn’t have to be that way. If students submit work when it is ready to be submitted, if they are not all working in lock-step, and if it is a work of love rather than compliance, then assessment is often a positively pleasurable task and is naturally staggered. Yes, it probably costs a bit more time in the end (though there are plenty of ways to mitigate that, from peer groups to pedagogical design) but every part of it is dedicated to learning, and the results are much better for everyone.

Some useful further reading

This is a fairly random selection of sources that relate to the principles above in one way or another. I have definitely missed a lot. Sorry for any missing URLs or paywalled articles: you may be able to find downloadable online versions somewhere.

Boud, D., & Falchikov, N. (2006). Aligning assessment with long-term learning. Assessment & Evaluation in Higher Education, 31(4), 399-413. Retrieved from https://www.jhsph.edu/departments/population-family-and-reproductive-health/_docs/teaching-resources/cla-01-aligning-assessment-with-long-term-learning.pdf

Boud, D. (2007). Reframing assessment as if learning were important. Retrieved from https://www.researchgate.net/publication/305060897_Reframing_assessment_as_if_learning_were_important

Cooperrider, D. L., & Srivastva, S. (1987). Appreciative inquiry in organizational life. Research in organizational change and development, 1, 129-169.

Deci, E. L., Vallerand, R. J., Pelletier, L. G., & Ryan, R. M. (1991). Motivation and education: The self-determination perspective. Educational Psychologist, 26(3/4), 325-346.

Hussey, T., & Smith, P. (2002). The trouble with learning outcomes. Active Learning in Higher Education, 3(3), 220-233.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A’s, praise, and other bribes (Kindle ed.). Mariner Books. (this one is worth forking out money for).

Kohn, A. (2011). The case against grades. Educational Leadership, 69(3), 28-33.

Kohn, A. (2015). Four Reasons to Worry About “Personalized Learning”. Retrieved from http://www.alfiekohn.org/blogs/personalized/ (check out Alfie Kohn’s whole site for plentiful other papers and articles – consistently excellent).

Reeve, J. (2002). Self-determination theory applied to educational settings. In E. L. Deci & R. M. Ryan (Eds.), Handbook of Self-Determination research (pp. 183-203). Rochester, NY: The University of Rochester Press.

Ryan, R. M., & Deci, E. L. (2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications. (may be worth paying for if such things interest you).

Wilson-Grau, R., & Britt, H. (2012). Outcome harvesting. Cairo: Ford Foundation. http://www.managingforimpact.org/sites/default/files/resource/outome_harvesting_brief_final_2012-05-2-1.pdf.

Our educational assessment systems are designed to create losers

Posted July 4, 2017, updated onJuly 4, 2017 by Jon Dron

The always wonderful Alfie Kohn describes an airline survey that sought to find out how it compared with others, which he chose not to answer because the airline was thus signalling no interest in providing the best quality experience possible, just aiming to do enough to beat the competition. The thrust of his article is that much the same is true of standardized tests in schools. As Kohn rightly observes, the central purpose of testing as it tends to be used in schools and beyond is not to evaluate successful learning but to compare students (or teachers, or institutions, or regions) with one another in order to identify winners and losers.

‘When you think about it, all standardized tests — not just those that are norm-referenced — are based on this compulsion to compare. If we were interested in educational excellence, we could use authentic forms of assessment that are based on students’ performance at a variety of classroom projects over time. The only reason to standardize the process, to give all kids the same questions under the same conditions on a contrived, one-shot, high-stakes test, is if what we wanted to know wasn’t “How well are they learning?” but “Who’s beating whom?”‘

It’s a good point, but I think it is not just an issue with standardized tests. The problem occurs with all the summative assessments (the judgments) we use. Our educational assessment systems are designed to create losers as much as they a made to find winners. Whether they follow the heinous practice of norm-referencing or not, they are sorting machines, built to discover competent people, and to discard the incompetent. In fact, as Kohn notes, when there are too many winners we are accused of grade inflation or a dropping of standards.

This makes no sense if you believe, as I do, that the purpose of education is to educate. In a system that demands grading, unless 100% of students that want to succeed get the best possible grades, then we have failed to meet the grade ourselves. The problem, though, is not so much the judgments themselves as it is the intimate, inextricable binding of judgmental with learning processes. Given enough time, effort, and effective teaching, almost anyone can achieve pretty much any skill or competence, as long as they stick at it. We have very deliberately built a system that does not aim for that at all. Instead, it aims to sort wheat from chaff. That’s not why I do the job I do, and I hope it is not why you do it either, but that’s exactly what the system is made to do. And yet we (at least I) think of ourselves as educators, not judges. These two roles are utterly separate and inconsolably inconsistent.

Who needs 100%?

It might be argued that some students don’t actually want to get the best possible grades. True. And sure, we don’t always want or need to learn everything we could learn. If I am learning how to use a new device or musical instrument I sometimes read/watch enough to get me started and do not go any further, or skim through to get the general gist. Going for a less-than-perfect understanding is absolutely fine if that’s all you need right now. But that’s not quite how it works in formal education, in part because we punish those that make such choices (by giving lower grades) and in part because we systematically force students to learn stuff they neither want nor need to learn, at a time that we choose, using the lure of the big prizes at the end to coax them. Even those that actually do want or need to learn a topic must stick with it to the bitter end regardless of whether it is useful to do the whole thing, regardless of whether they need more or less of it, regardless of whether it is the right time to learn it, regardless of whether it is the right way for them to learn it. They must do all that we say they must do, or we won’t give them the gold star. That’s not even a good way to train a dog.

It gets worse. At least dogs normally get a second chance. Having set the bar, we normally give just a single chance at winning or, at best, an option to be re-tested (often at a price and usually only once), rather than doing the human thing of allowing people to take the time they need and learn from their mistakes until they get as good as they want or need to get. We could learn a thing or two from computer games – the ability to repeat over and over, achieving small wins all along the way without huge penalties for losing, is a powerful way to gain competence and sustain motivation. It is better if students have some control over the pacing but, even at Athabasca, an aggressively open university that does its best to give everyone all the opportunity they need to succeed, where self-paced learners can choose the point at which they are ready to take the assessments, we still have strict cut-offs for contract periods and, like all the rest, we still tend to allow just a single stab at each assessment. In most of my own self-paced courses (and in some others) we try to soften that by allowing students to iterate without penalty until the end but, when that end comes, that’s still it. This is not for the benefit of the students: this is for our convenience. Yes, there is a cost to giving greater freedom – it takes time, effort, and compassion – but that’s a business problem to solve, not an insuperable barrier. WGU’s subscription model, for instance, in which students pay for an all-you-can-eat smorgasbord, appears to work pretty well.

Meta lessons

It might be argued that there are other important lessons that we teach when we competitively grade. Some might suggest that competition is a good thing to learn in and of itself, because it is one of the things that drives society and everyone has to do it at least sometimes. Sure, but cooperation and mutual support is usually better, or at least an essential counterpart, so embedding competition as the one and only modality seems a bit limiting. And, if we are serious about teaching people about how to compete, then that is what we should do, and not actively put them in jeopardy to achieve that: as Jerome Bruner succinctly put it, ‘Learning something with the aid of an instructor should, if instruction is effective, be less dangerous or risky or painful than learning on one’s own’ (Bruner 1966, p.44).

Others might claim that sticking with something you don’t like doing is a necessary lesson if people are to play a suitably humble/productive role in society. Such lessons have a place, I kind-of agree. Just not a central place, just not a pervasive place that underpins or, worse, displaces everything else. Yes, grit can be really useful, if you are pursuing your goals or helping others to reach theirs. By all means, let’s teach that, let’s nurture that, and by all means let’s do what we can to help students see how learning something we are teaching can help them to reach their goals, even though it might be difficult or unpleasant right now. But there’s a big difference between doing something for self or others, and subservient compliance with someone else’s demands. ‘Grit’ does not have to be synonymous with ‘taking orders’. Doing something distasteful because we feel we must, because it aligns with our sense of self-worth, because it will help those we care about, because it will lead us where we want to be, is all good. Doing something because someone else is making us do it (with the threat/reward of grades) might turn us into good soldiers, might generate a subservient workforce in a factory or coal face, might keep an unruly subjugated populace in check, but it’s not the kind of attitude that is going to be helpful if we want to nurture creative, caring, useful members of 21st Century society.

Societal roles

It might be argued that accreditation serves a powerful societal function, ranking and categorizing people in ways that (at least for the winners and for consumers of graduates) have some value. It’s a broken and heartless system, but our societies do tend to be organized around it and it would be quite disruptive if we got rid of it without finding some replacement. Without it, employers might actually need to look at evidence of what people have done, for instance, rather than speedily weeding out those with insufficient grades. Moreover, circularly enough, most of our students currently want and expect it because it’s how things are done in our culture. Even I, a critic of the system, proudly wear the label ‘Doctor’, because it confers status and signals particular kinds of achievement, and there is no doubt that it and other qualifications have been really quite useful in my career. If that were all accreditation did then I could quite happily live with it, even though the fact that I spent a few years researching something interesting about 15 years ago probably has relatively little bearing on what I do or can do now. The problem is not accreditation in itself, but that it is inextricably bound to the learning process. Under such conditions, educational assessment systems are positively harmful to learning. They are anti-educative. Of necessity, due to the fact that they tend to determine precisely what students should do and how they should do it, they sap intrinsic motivation and undermine love of learning. Even the staunchest of defenders of tightly integrated learning and judgment would presumably accept that learning is at least as important as grading so, if grading undermines learning (and it quite unequivocally does), something is badly broken.

A simple solution?

It does not have to be this way. I’ve said it before but it bears repeating: at least a large part of the solution is to decouple learning and accreditation altogether. There is a need for some means to indicate prowess, sure. But the crude certificates we currently use may not be the best way to do that in all cases, and it doesn’t have to dominate the learning process to the point of killing love of learning. If we could drop the accreditation role during the teaching process we could focus much more on providing useful feedback, on valorizing failures as useful steps towards success, on making interesting diversions, on tailoring the learning experience to the learner’s interests and capabilities rather than to credential requirements, on providing learning experiences that are long enough and detailed enough for the students’ needs, rather than a uniform set of fixed lengths to suit our bureaucracies.

Equally, we could improve our ability to provide credentials. For those that need it, we could still offer plenty of accreditation opportunities, for example through a portfolio-based approach and/or collecting records of learning or badges along the way. We could even allow for some kind of testing like oral, written, or practical exams for those that must, where it is appropriate to the competence (not, as now, as a matter of course) and we could actually do it right, rather than in ways that positively enable and reward cheating. None of this has to bound to specific courses. This decoupling would also give students the freedom to choose other ways of learning apart from our own courses, which would be quite a strong incentive for us to concentrate on teaching well. It might challenge us to come up with authentic forms of assessment that allow students to demonstrate competence through practice, or to use evidence from multiple sources, or to show their particular and unique skillset. It would almost certainly let us do both accreditation and teaching better. And it’s not as though we have no models to work from: from driving tests to diving tests to uses of portfolios in job interviews, there are plenty of examples of ways this can work already.

Apart from some increased complexities of managing such a system (which is where online tools can come in handy and where opportunities exist for online institutions that conventional face-to-face institutions cannot compete with) this is not a million miles removed from what we do now: it doesn’t require a revolution, just a simple shift in emphasis, and a separation of two unnecessarily and mutually inconsistent intertwined roles. Especially when processes and tools already exist for that, as they do at Athabasca University, it would not even be particularly costly. Inertia would be a bigger problem than anything else, but even big ships can eventually be steered in other directions. We just have to choose to make it so.

Reference

Bruner, J. S. (1966). Toward a Theory of Instruction. Cambridge MA: The Belknap Press of Harvard University Press.

Understanding the response to financial and non-financial incentives in education: Field experimental evidence using high-stakes assessments

Posted November 6, 2016, updated onNovember 24, 2022 by Jon Dron

What they did

This is a report by, Simon Burgess, Robert Metcalfe, and Sally Sadoff on a large scale study conducted in the UK on the effects of financial and non-financial incentives on GCSE scores (GCSEs are UK qualifications usually taken around age 16 and usually involving exams), involving over 10,000 students in 63 schools being given cash or ‘non-financial incentives’. ‘Non-financial incentives’ did not stretch as far as a pat on the back or encouragement given by caring teachers – this was about giving tickets for appealing events. The rewards were given not for getting good results but for particular behaviours the researchers felt should be useful proxies for effective study: specifically, attendance, conduct, homework, and classwork. None of the incentives were huge rewards to those already possessing plenty of creature comforts but, for poorer students, they might have seemed substantial. Effectiveness of the intervention was measured in terminal grades. The researchers were very thorough and were very careful to observe limitations and concerns. It is as close to an experimental design as you can get in a messy real-world educational intervention, with numbers that are sufficient and diverse enough to make justifiable empirical claims about the generalizability of the results.

What they found

Rewards had little effect on average marks overall, and it made little difference whether rewards were financial or not. However, in high risk groups (poor, immigrants, etc) there was a substantial improvement in GCSE results for those given rewards, compared with the control groups.

My thoughts

The only thing that does surprise me a little is that so little effect was seen overall, but I hypothesize that the reward/punishment conditions are so extreme already among GCSE students that it made little difference to add any more to the mix. The only ones that might be affected would be those for whom the extrinsic motivation is not already strong enough. There is also a possibility that the demotivating effects for some were balanced out by the compliance effects for others: averages are incredibly dangerous things, and this study is big on averages.

What makes me sad is that there appears to be no sense of surprise or moral outrage about this basic premise in this report.

dogs being whipped, from Jack London's 'Call of the Wild' It appears reasonable at first glance: who would not want kids to be more successful in their exams? When my own kids had to do this sort of thing I would have been very keen on something that would improve their chances of success, and would be especially keen on something that appears to help to reduce systemic inequalities. But this is not about helping students to learn or improving education: this is completely and utterly about enforcing compliance and improving exam results. The fact that there might be a perceived benefit to the victims is a red herring: it’s like saying that hitting dogs harder is good for the dogs because it makes them behave better than hitting them gently. The point is that we should not be hitting them at all. It’s not just morally wrong, it doesn’t even work very well, and only continues to work at all if you keep hitting them. It teaches students that the end matters more than the process, that learning is inherently undesirable and should only done when there is a promise of a reward or threat of punishment, and that they are not in charge of it.

The inevitable result of increasing rewards (or punishments – they are functionally equivalent) is to further quench any love of learning that might be left at this point in their school careers, to reinforce harmful beliefs about how to learn, and to further put students off the subjects they might have loved under other circumstances for life. In years to come people will look back on barbaric practices like this much as we now look back at the slave trade or pre-emancipation rights for women.

Studies like this make me feel a bit sick.

Address of the bookmark: http://www.efm.bris.ac.uk/economics/working_papers/pdffiles/dp16678.pdf