More than a game: some thoughts on David Wiley’s “Random Audits as a Scalable Deterrent to Cheating”

Source: Random Audits as a Scalable Deterrent to Cheating: Using Game Theory to Design Fair and Effective Academic Integrity Systems for the AI Era by David Wiley Though not particularly common, the general principle of only assessing a sample of work with oral exams (viva voces) is well established, and is common practice in a number of institutions (e.g. UC Berkeley or UC London). What’s smart and novel about David Wiley’s new variation on the theme is the rigour with which he approaches the problem. The headliner is his use of game theory to identify the optimum sample range (no point in auditing mediocre results or fails), sample rate (to make the risk of detection significant enough to deter wrongdoers), penalty for failure (neither so small that the risk is acceptable nor so large that people are deterred from applying it), and appropriate audit bonus (so honest students gain some but not too much benefit from being audited to make up for the discomfort, inconvenience, and pain). It’s a nicely balanced process, playing with the incentives so as to take some of the sting out of being selected to be assessed by offering opportunities to increase grades. There’s also a lot of careful thought given to the administrative and pedagogical details of how to make it all work, so that students are forced to think clearly about the pros and cons of cheating, and it is all done fairly and efficiently. It’s a very well considered set of techniques for reducing the faculty workload and reducing the chances of cheating.

For all that is good about it, I think it’s almost exactly the wrong idea, though I have an idea to save it.

Problems with oral exams

For the majority of students in search of credentials, oral exams are at the better end of the summative assessment spectrum, because they are:
  • efficient (on average, it takes no longer to ascertain someone knows what they are talking about than it does to properly mark an exam or assignment and, crucially, it demands less time from the student),
  • reliable (very hard, though not impossible to fake or cheat),
  • personal (you can explore personal strengths and misconceptions),
  • responsive (feedback can be immediate),
  • social (caring can be demonstrated),
  • often authentic (depends on context), and, above all,
  • useful learning experiences in their own right, for all concerned, including examiners.
In universities, oral exams predate written exams by many, many centuries. It was by far the most common way to assess students for credentials right up to at least the 19th Century, and it generally worked well, notwithstanding the problems dealing with geometry and other visual disciplines that led to the Cambridge Tripos (the first modern written exams) in the late C18th. It’s still very popular in some regions, especially for higher degrees, though it has fallen out of favour across much of higher education because it is hard work and difficult to scale. While each one is quite efficient in itself, when you have to do schedule a few hundred of them it really eats into your time and energy.  There are some major issues for students who have speech impediments, hearing problems, or who are simply using a foreign language, so alternatives or workarounds must be available, and extraordinary care must be taken to avoid personal biases because it is prohibitively expensive and impractical to anonymize them. All in all, though, for most students it is one of the least bad of a bad bunch.

Unfortunately, oral exams have one very fatal flaw inasmuch as, far more than for written exams (which are unpleasant enough for most students), they can be incredibly intimidating. Few students actually like them but, for a significant number, they are beyond mortifying. I have known students to freeze, cry, walk out, and even fail an entire PhD (though that was later corrected) as a result of having to defend their work this way. The stress can be mitigated somewhat with counselling, therapy, practice, caring tuition, and sensitive questioning, but it is difficult if not impossible to completely eliminate this problem, and time spent developing counter-technologies to the technology of assessment is time better spent learning the subject in question.

I think that David’s rational game-theoretic approach fails to take this sufficiently into account. For students facing the prospect of extreme trauma, no matter how competent they might be in the subject, the most rational course of action in David’s system would often be to aim for a low mark that would not get audited rather than risk having to be examined. There are plenty of students who don’t need high GPAs, for whom a straight pass is a rational choice. However, in itself, this would be a risky strategy because it is really difficult to tread the fine line between a low pass and a fail or higher pass, either of which would be very bad news, all of which would add stress not just at exam time but throughout the course. Under such circumstances, a student who had taken the game theory to heart would probably realize that the most effective way to be likely to get a low pass would be to ask a generative AI to produce work that that level: in my own experiments I have found them to be remarkably good at targeting a particular grade, as long as you feed them half-decent rubrics.

It is also far from infallible, because few of us are rational game players. On the whole, cheating tends to occur when students are very stressed and they panic: it’s often barely a rational choice at all. Few actually want to cheat and all of them already know it is a risky option: it’s just the least bad of a limited number of very bad alternatives. Making the risks higher and quantifying them is not a solution to this. If anything, for at least a few of the most at-risk students, it will just make the problem worse because the pressure is greater. Also, for the truly disengaged students who are most likely to cheat, this might just be another thing they do not learn, so they would not even be playing the game, though they would certainly come to regret it if they were audited.

Sampling problems

Another problem with David’s approach is that it is a very much stronger signal of the authority and control that the teacher/institution has over the the student than the conventional process, with no pretence that it serves any further purpose than to catch cheats. If it were to support learning then everyone should be doing it, and the fact that there is a reward for being audited just further emphasizes that it is an undesirable activity that students are being forced to do. At least as bad, it doesn’t just allow but it actively recommends an instrumental approach to learning: it literally teaches students how to game the system. For anyone wanting to use this approach, I would therefore strongly recommend combining it with ways to attempt to restore lost autonomy, for example by encouraging students to design some of their own outcomes, or to have input into the means of assessment, or to have plenty of flexibility in the timing of submissions, or at the very least to be able to choose different ways of demonstrating their competence from a range of options. Among the benefits of doing this, the chances of them cheating in the first place would be significantly reduced.

There is also a time commitment to learning how to play that game rather than learning the stuff the course is actually about.  I don’t see an easy way of avoiding this altogether though, if it were applied across the board to a whole program, the proportion of time spent on it could be reduced for each course. It would be a brilliant idea to use it in a course on game theory, of course.

It bothers me that the method deliberately excludes students who don’t get great results. It seems to me that they are the ones who would most benefit from a chance to improve them, so it amplifies the divide between the haves and the have-nots. At the very least, it should be possible for such students to ask for an oral exam, under the same conditions as those who get selected for random testing. The selection process again sends a bad message: that high achievement makes you a suspect.

While the proposed sample rates make sense for a single course, if all courses worked this way then, by the end of the program, almost every student would have at some point been audited, most likely more than once. For someone with a strong phobia, this might actually be worse than having to do it for every course: knowing that, at any point, your worst nightmare is going to happen is probably not going to improve your chances of persisting to the end of a program. It’s a problem both in the stress-filled build-up and (if not selected) the massive surge of relief that follows. The pain/relief patterns are not dissimilar to those of, say, gambling or drug addiction.

Motivation problems

David claims that it is not a technology problem but an incentive problem. I disagree. This very much is a technology problem, and David’s solution is totally a technological solution: it’s just not a digital technology problem. And, in the context of the technology in question – that of credentialing – it is not an incentive problem but a motivation problem. Treating it as an incentive problem limits it to the subset of motivation that is both extrinsic and externally regulated: the worst possible kind. Externally regulated extrinsic motivation reliably kills intrinsic motivation so this both takes away the love of simply doing the work and actively harms motivation to do so in future.

The trouble with David’s solution is that it doesn’t deal with or consider the reasons that students cheat in the first place: it’s just a response to the fact that some do. Vanishingly few students start out a course with the intention of cheating their way through it. Rather, the pressures they face (almost all extrinsic) make cheating a rational response and/or the result of panic. All that David’s solution does is to make it a bit less rational. Students will still do it for irrational, emotionally charged reasons, and it not only does nothing to eliminate the root causes but it actually amplifies them, piling on additional pressure.

Like all technologies, there are other ways to solve this problem and, like all technologies, it is a Faustian bargain that creates new problems of its own. David’s solution, with the aforementioned provisos, is a potentially effective and efficient solution to cheating but it is likely to have the opposite effect on learning, especially once the course is over. It’s just a counter-technology for dealing with flaws in the underlying credentialing approach, and it demands further counter-technologies of its own to deal with its big fatal flaw if it is going to work at all well. It’s not at all unusual in this.

A better solution?

I think this is fixable. I reckon David’s solution would work a lot better if, instead of auditing assignments or exams for a single course, it were applied to a basket of courses (say, 3-6 of them) and, in the oral exam, students were asked to synthesize, connect and utilize what they have learned in all of them. This is not unlike some fairly common approaches to PhDs or capstone projects, where students create something then talk about it in more or less formal ways (presentations, demos, crits, viva voces, etc). If done with commitment, it could largely decouple learning and assessment because instrumental revision would not be an option: the only way to revise effectively would be to engage in positive learning activities that involve exactly the kind of synthesis we would examine, which would make it personal, relevant, and interesting, especially if (to make it authentic) it were done with other people.

With a bit of ingenuity, it might be possible to remove all grades and credit for the courses themselves, so students could learn without the usual extrinsic pressures. Every student would automatically get a provisional generic pass on each of the basket of courses, no questions asked. If they were audited then they might improve that (or fail), as David suggests. For the sake of equity, every student would have the right to ask to be audited, so the high-flyers who cared about getting a high grade could have an opportunity to get one. The rest could learn with significantly reduced pressure.

An obvious objection is that it would increase the high stakes when that assessment did actually happen. One way to reduce that problem would be to allow repeated attempts, with no additional penalty, or to make it a “best of three” of something along those lines. Though that would somewhat reduce the efficiency of the solution, as long as it were structured to make it relatively rare, it would be worth the extra bother. It would also be good to provide coaching, counselling, and plentiful opportunities to practice. For some subjects there might be less pressured approaches than oral exams that would achieve similar results, such as observation studies of them working on a problem, or group discussions, or structured peer interviews. Perhaps it could be a series of conversations throughout the program, none of which carries a definitive grade in itself but that, together, add up to an overall assessment. There’s scope for further innovation here.

It would be more important than ever to provide plentiful formative assessment during the courses themselves, and to provide ways of practising those skills in synthesis. The latter could be done within those courses or, perhaps better, a “synthesis” course could be provided for this purpose, operating in much the same way as Brunel’s assessment modules in their Integrated Programme Assessment approach. Among the advantages of this, it would allow students to do some work that might be used as part of an alternative assessment for those suffering from extreme fear of or difficulties participating in the oral exam.

It is not perfect, and it would be no use for situations such as those at Athabasca University, where many students are taking only one or two courses, often as visitors from other programs. However, for program students, even more than David’s approach, this would massively reduce the marking burden while making a positive contribution to learning and motivation to learn.    

Slides from my ICEEL ’24 Keynote: “No Teacher Left Behind: Surviving Transformation”

Here are the slides from from my keynote at the 8th International Conference on Education and E-Learning in Tokyo yesterday. Sadly I was not actually in Tokyo for this but the online integration was well done and there was some good audience interaction. I am also the conference chair (an honorary title) so I may be a bit biased, but I think it’s a really good conference, with an increasingly rare blend of both the tech and the pedagogical aspects of the field, and some wonderfully diverse keynotes ranging in subject matter from the hardest computer science to reflections on literature and love (thanks to its collocation with ICLLL, a literature and linguistics conference). My keynote was somewhere in between, and deliberately targeted at the conference theme, “Transformative Learning in the Digital Era: Navigating Innovation and Inclusion.”

the technological connectome, represented in the style of 1950s children's booksAs my starting point for the talk I introduced the concept of the technological connectome, about which I have just written a paper (currently under revision, hopefully due for publication in a forthcoming issue of the new Journal of Open, Distance, and Digital Education), which is essentially a way of talking about extended cognition from a technological rather than a cognitive perspective. From there I moved on to the adjacent possible and the exponential growth in technology that has, over the past century or so, reached such a breakneck rate of change that innovations such as generative AI, the transformation I particularly focused on (because it is topical), can transform vast swathes of culture and practice in months if not in weeks. This is a bit of a problem for traditional educators, who are as unprepared as anyone else for it, but who find themselves in a system that could not be more vulnerable to the consequences. At the very least it disrupts the learning outcomes-driven teacher-centric model of teaching that still massively dominates institutional learning the world over, both in the mockery it makes of traditional assessment practices and in the fact that generative AIs make far better teachers if all you care about are the measurable outcomes.

The solutions I presented and that formed the bulk of the talk, largely informed by the model of education presented in How Education Works, were mostly pretty traditional, emphasizing the value of community, and of passion for learning, along with caring about, respecting, and supporting learners. There were also some slightly less conventional but widely held perspectives on assessment, plus a bit of complexivist thinking about celebrating the many teachers and acknowledging the technological connectome as the means, the object and the subject of learning, but nothing Earth-shatteringly novel. I think this is as it should be. We don’t need new values and attitudes; we just need to emphasize those that are learning-positive rather than the increasingly mainstream learning-negative, outcomes-driven, externally regulated approaches that the cult of measurement imposes on us.

Post-secondary institutions have had to grapple with their learning-antagonistic role of summative assessment since not long after their inception so this is not a new problem but, until recent decades, the two roles have largely maintained an uneasy truce. A great deal of the impetus for the shift has come from expanding access to PSE. This has resulted in students who are less able, less willing, and less well-supported than their forebears who were, on average, far more advantaged in ability, motivation, and unencumbered time simply because fewer were able to get in. In the past, teachers hardly needed to teach. The students were already very capable, and had few other demands on their time (like working to get through college), so they just needed to hang out with smart people, some of whom who knew the subject and could guide them through it in order to know what to learn and whether they had been successful, along with the time and resources to support their learning. Teachers could be confident that, as long as students had the resources (libraries, lecture notes, study time, other students) they would be sufficiently driven by the need to pass the assessments and/or intrinsic interest, that they could largely be left to their own devices (OK, a slight caricature, but not far off the reality).

Unfortunately, though this is no longer even close to the norm,  it is still the model on which most universities are based.  Most of the time professors are still hired because of their research skills, not teaching ability, and it is relatively rare that they are expected to receive more than the most perfunctory training, let alone education, in how to teach. Those with an interest usually have opportunities to develop their skills but, if they do not, there are few consequences. Thanks to the technological connectome, the rewards and punishments of credentials continue to do the job well enough, notwithstanding the vast amounts of cheating, satisficing, student suffering, and lost love of learning that ensues. There are still plenty of teachers: students have textbooks, YouTube tutorials, other students, help sites, and ChatGPT, to name but a few, of which there are more every day. This is probably all that is propping up a fundamentally dysfunctional system. Increasingly, the primary value of post-secondary education comes to lie in its credentialling function.

No one who wants to teach wants this, but virtually all of those who teach in universities are the ones who succeeded in retaining their love of learning for its own sake despite it, so they find it hard to understand students who don’t. Too many (though, I believe, a minority) are positively hostile to their students as a result, believing that most students are lazy, willing to cheat, or to otherwise game the system, and they set up elaborate means of control and gotchas to trap them.  The majority who want the best for their students, however,  are also to blame, seeing their purpose as to improve grades, using “learning science” (which is like using colour theory to paint – useful, not essential) to develop methods that will, on average, do so more effectively. In fairness, though grades are not the purpose, they are not wrong about the need to teach the measurable stuff well: it does matter to achieve the skills and knowledge that students set out to achieve. However, it is only part of the purpose. Mostly, education is a means to less measurable ends; of forming identities, attitudes, values, ways of relating to others, ways of thinking, and ways of being. You don’t need the best teaching methods to achieve that: you just need to care, and to create environments and structures that support stuff like community, diversity, connection, sharing, openness, collaboration, play, and passion.

The keynote was recorded but I am not sure if or when it will be available. If it is released on a public site, I will share it here.

At the end of this post a successful reader will be able to make better use of learning outcomes

Jennie Young nails it in this delightful little bit of satire about the misuse of learning outcomes in education, Forget the Magic of Discovery, It’s Learning Outcomes That Help Children Identify, Comprehend, and Synthesize Their Dreams.

Learning outcomes do have their uses. They are very useful tools when designing learning activities, courses, and programs. Done well, they help guide and manage the process, and they are especially helpful in teams as a way to share intentions and establish boundaries, which can also be handy when thinking about how they fit into a broader program of study, or how they mesh with other learning activities elsewhere. They can perform a useful role in assessment. I find them especially valuable when I’m called upon to provide a credential because, rather than giving marks to assignments that I force students to do, I can give marks for learning outcomes, thereby allowing students to select their own evidence of having met them. It’s a great way to encourage participation in a learning community without the appallingly controlling, inauthentic, but widespread practice of giving marks for discussion contributions because such contributions can be very good evidence of learning, but there are other ways to provide it. It also makes it very easy to demonstrate to others that course outcomes have been met, it makes it easy for students to understand the marks they received,  it helps to avoid over-assessment and, especially if students are involved in creating or weighting the outcomes themselves, it empowers them to take control of the assessment process. Coming up with the evidence is also a great reflective exercise in itself, and a chance to spot any gaps before it makes a difference to the marks. Learning outcomes can also help teachers as part of how they evaluate the success of an educational intervention, though it is better to harvest outcomes than to just measure achievement of ones that are pre-specified because, if teaching is successful, students always learn more than what we require them to learn. However, they should never be used in a managerial process as objective, measurable ways of monitoring performance because that is simply not what they do.

They can have some limited value for students when initially choosing a learning activity, course, or program, or (with care and support) for evaluating their own success. However, they should seldom if ever be the first things students see because you could hardly be more boring or controlling than to start with “at the end of this course you will …”. And they should seldom if ever be used to  constrain or hobble teaching or learning because, as Young’s article makes beautifully clear, learning is an adventure into the unknown that should be full of surprises, for learners and for teachers. That said, there are a few kinds of learning outcome (that I have been thinking about including in my own courses for many years but have yet to work up the nerve to implement) that might be exceptions. For example…

At the end of this course a successful student will be able to:

  • feel a sense of wonder and excitement about [subject];
  • feel a passionate need to learn more about [subject];
  • teach their teacher about [subject];
  • enthusiastically take the course again and learn something completely different the second time around;
  • learn better;
  • do something in [subject] that no one has ever done before;
  • use what they have learned to make the world a better place;
  • explain [subject] to their teacher’s grandmother in a way that she would finally understand;
  • laugh uncontrollably at a joke that only experts in the field would get;
  • tell an original good joke that only experts in the field would get and that would make them laugh;
  • at a dinner party, even when slightly tipsy, convince an expert in the field that they are more of an expert;
  • design and deliver a better course than this on [subject].

I would totally enrol on this course.

 

Some meandering thoughts on ‘good’ and ‘bad’ learning

There has been an interesting brief discussion on Twitter recently that has hinged around whether and how people are ‘good’ at learning. As Kelly Matthews observes, though, Twitter is not the right place to go into any depth on this, so here is a (still quite brief) summary of my perspective on it, with a view to continuing the conversation.

Humans are nearly all pretty good at learning because that’s pretty much the defining characteristic of our species. We are driven by an insatiable drive to learn at from the moment of our birth (at least). Also, though I’m keeping an open mind about octopuses and crows, we seem to be better at it than at least most other animals. Our big advantage is that we have technologies, from language to the Internet, to share and extend our learning, so we can learn more, individually and collectively, than any other species. It is difficult or impossible to fully separate individual learning from collective learning because our cognition extends into and is intimately a part of the cognition of others, living and dead.

However, though we learn nearly all that we know, directly or indirectly, from and with other people, what we learn may not be helpful, may not be as effectively learned as it should, and may not much resemble what those whose job is to teach us intend. What we learn in schools and universities might include a dislike of a subject, how to conceal our chat from our teacher, how to meet the teacher’s goals without actually learning anything, how to cheat, and so on. Equally, we may learn falsehoods, half-truths, and unproductive ways of doing stuff from the vast collective teacher that surrounds us as well as from those designated as teachers.

For instance, among the many unintended lessons that schools and colleges too often teach is the worst one of all: that (despite our obvious innate love of it) learning is an unpleasant activity, so extrinsic motivation is needed for it to occur. This results from the inherent problem that, in traditional education, everyone is supposed to learn the same stuff in the same place at the same time. Students must therefore:

  1. submit to the authority of the teacher and the institutional rules, and
  2. be made to engage in some activities that are insufficiently challenging, and some that are too challenging.

This undermines two of the three essential requirements for intrinsic motivation, support for autonomy and competence (Ryan & Deci, 2017).  Pedagogical methods are solutions to problems, and the amotivation inherently caused by the system of teaching is (arguably) the biggest problem that they must solve. Thus, what passes as good teaching is largely to do with solving the problems caused by the system of teaching itself. Good teachers enthuse, are responsive, and use approaches such as active learning, problem or inquiry-based learning, ungrading, etc, largely to restore agency and flexibility in a dominative and inflexible system. Unfortunately, such methods rely on the technique and passion of talented, motivated teachers with enough time and attention to spend on supporting their students. Less good and/or time-poor teachers may not achieve great results this way. In fact, as we measure such things, on average, such pedagogies are less effective than harder, dominative approaches like direct instruction (Hattie, 2013) because, by definition, most teachers are average or below average. So, instead of helping students to find their own motivation, many teachers and/or their institutions typically apply extrinsic motivation, such as grades, mandatory attendance, classroom rules, etc to do the job of motivating their students for them. These do work, in the sense of achieving compliance and, on the whole, they do lead to students getting a normal bell-curve of grades that is somewhat better than those using more liberative approaches. However, the cost is huge. The biggest cost is that extrinsic motivation reliably undermines intrinsic motivation and, often, kills it for good (Kohn, 1999). Students are thus taught to dislike or, at best, feel indifferent to learning, and so they learn to be satisficing, ineffective learners, doing what they might otherwise do for the love of it for the credentials and, too often, forgetting what they learned the moment that goal is achieved. But that’s not the only problem.

When we learn from others – not just those labelled as teachers but the vast teaching gestalt of all the people around us and before us who create(d) stuff, communicate(d), share(d), and contribute(d) to what and how we learn – we typically learn, as Paul (2020) puts it, not just the grist (the stuff we remember) but the mill (the ways of thinking, being, and learning that underpin them). When the mill is inherently harmful to motivation, it will not serve us well in our future learning.

Furthermore, in good ways and bad, this is a ratchet at every scale. The more we learn, individually and collectively, the more new stuff we are able to learn. New learning creates new adjacent possible empty niches (Kauffman, 2019) for us to learn more, and to apply that learning to learn still more, to connect stuff (including other stuff we have learned) in new and often unique ways. This is, in principle, very good. However, if what and how we learn is unhelpful, incorrect, inefficient, or counter-productive, the ratchet takes us further away from stuff we have bypassed along the way. The adjacent possibles that might have been available with better guidance remain out of our reach and, sometimes, even harder to get to than if the ratchet hadn’t lifted us high enough in the first place. Not knowing enough is a problem but, if there are gaps, then they can be filled. If we have taken a wrong turn, then we often have to unlearn some or all of what we have learned before we can start filling those gaps. It’s difficult to unlearn a way of learning. Indeed, it is difficult to unlearn anything we have learned. Often, it is more difficult than learning it in the first place.

That said, it’s complex, and entangled. For instance, if you are learning the violin then there are essentially two main ways to angle the wrist of the hand that fingers the notes, and the easiest, most natural way (for beginners) is to bend your hand backwards from the wrist, especially if you don’t hold the violin with your chin, because it supports the neck more easily and, in first position, your fingers quickly learn to hit the right bit of the fingerboard, relative to your hand. Unfortunately, this is a very bad idea if you want a good vibrato, precision, delicacy, or the ability to move further up the fingerboard: the easiest way to do that kind of thing is to to keep your wrist straight or slightly angled in from the wrist, and to support the violin with your chin. It’s more difficult at first, but it takes you further. Once the ‘wrong’ way has been learned, it is usually much more difficult to unlearn than if you were starting from scratch the ‘right’ way. Habits harden. Complexity emerges, though, because many folk violin styles make a positive virtue of holding the violin the ‘wrong’ way, and it contributes materially to the rollicking rhythmic styles that tend to characterize folk fiddle playing around the world. In other words, ‘bad’ learning can lead to good – even sublime – results. There is similarly plenty of space for idiosyncratic technique in many of the most significant things we do, from writing to playing hockey to programming a computer and, of course, to learning itself. The differences in how we do such things are where creativity, originality, and personal style emerge, and you don’t necessarily need objectively great technique (hard technique) to do something amazing. It ain’t what you do, it’s the way that you do it, that’s what gets results. To be fair, it might be a different matter if you were a doctor who had learned the wrong names for the bones of the body or an accountant who didn’t know how to add up numbers. Some hard skills have to be done right: they are foundations for softer skills. This is true of just about every skill, to a greater or lesser extent, from writing letters and spelling to building a nuclear reactor and, indeed, to teaching.

There’s much more to be said on this subject and my forthcoming book includes a lot more about it! I hope this is enough to start a conversation or two, though.

References

Hattie, J. (2013). Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement. Taylor & Francis.

Kauffman, S. A. (2019). A World Beyond Physics: The Emergence and Evolution of Life. Oxford University Press.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A’s, praise, and other bribes (Kindle). Mariner Books.

Paul, A. M. (2021). The Extended Mind: The Power of Thinking Outside the Brain. HarperCollins.

Ryan, R. M., & Deci, E. L. (2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications.

 

Brunel University’s Integrated Programme Assessment – a neat way to decouple learning and credentials

I have frequently written about the need to decouple learning and credentials, so I love this approach to doing so from Brunel University. It fully decouples learning and credentials by offering ungraded study blocks (in North America the equivalent of courses, in the UK the equivalent of modules) with no summative assessments, followed by integrative assessment blocks, that provide opportunities for students to pull together what they have learned across their various courses/modules in a variety of (mostly) useful integrative learning activities for which marks are awarded. It’s neat, simple, practical, and effective.

The summative assessment load (for students and their professors) is reduced by more than 60%, the quality of those assessments increases (in every way), students feel better prepared for employment (and employers agree), it improves retention figures, teachers can focus on teaching, assessments are more authentic, more engaging, and it massively reduces cheating. The only significant downside that I can see in this is that it is not quite as flexible as a completely modular program – there are a few dependencies and limits on when and how students learn, albeit that these are no worse than in most in-person universities.

I learned about this from Peter Hartley, who mentioned it in a quite inspiring IFNTF talk on assessment yesterday. Amongst other things, Peter highlighted a wide range of issues with modularization (i.e. the standard approach used in many parts of the world of splitting up a program into a set of self-contained courses) and assessment, including, from his slides:

  1. Not assessing programme outcomes.
  2. Atomisation of assessment.
  3. Students and staff failing to see the links/coherence of the programme.
  4. Modules too short for complex learning.
  5. Surface learning and ‘tick-box’ mentality.
  6. Inappropriate ‘one-size-fits-all’.
  7. Over-standardisation in regulations.
  8. Too much summative assessment and feedback – not enough formative.

While I couldn’t agree more, for the most part, I have mixed feelings about some of Peter’s list of issues. I agree that the traditional 3 or 4 year program(me), in which the course of study is designed to work as a whole, not as a collection of self-contained pieces, is far better for integrating knowledge across a discipline, though I don’t see why it should always take exactly that amount of time to achieve mastery, and I am not even sure whether we should be thinking in terms of disciplines at all. There’s some value in the notion, for sure, and there are some kinds of subject and learning for which it makes sense, but I think a lot of it is down to centuries’ old tradition and post hoc justification rather than careful consideration of fitness for purpose. Also, it seems to me that summative assessment should always be formative, too, so the issue could be partly addressed by simply improving summative assessments, not by scrapping them altogether. However, I think Peter is fundamentally right that, due to modularization, most universities over-assess, that credentials become the reason for learning rather than the measurement of it (with all the very many evils that entails), that the big picture tends to be lost, that there is a ridiculously large administrative burden that results from it, and that learning – the point of the thing after all – consequently suffers. As we and much of the rest of the world start to move towards ever smaller chunks, with associated stackable microcredentials, badges, etc, this is going to be a bigger problem. Brunel’s solution is not the only way, but it is a radically disruptive intervention that that many universities could implement without breaking everything else in the process.

Originally posted at: https://landing.athabascau.ca/bookmarks/view/16012554/brunel-universitys-integrated-programme-assessment-a-neat-way-to-decouple-learning-and-credentials

So, this is a thing…

Students are now using AIs to write essays and assignments for credit, and they are (probably) getting away with it. This particular instance may be fake, but the tools are widely available and it would be bizarre were no one to be using them for this purpose. There are already far too many sites providing stuff like product reviews and news stories (re)written by AIs, and AIs are already being used for academic paper writing. In fact, systems for doing so, like CopyMatic or ArticleGenerator, are now a commodity item. So the next step will be that we will develop AIs to identify the work of other AIs (in fact, that is already a thing, e.g. here and here), and so it will go on, and on, and on.

This kind of thing will usually evade plagiarism checkers with ease, and may frequently fool human markers. For those of us working in educational institutions, I predict that traditionalists will demand that we double down on proctored exams, in a vain attempt to defend a system that is already broken beyond repair. There are better ways to deal with this: getting to know students, making each learning journey (and outputs) unique and personal, offering support for motivated students rather than trying to ‘motivate’ them, and so on. But that is not enough.

I am rather dreading the time when an artificial student takes one of my courses. The systems are probably too slow, quirky, and expensive right now for real-time deep fakes driven by plausible GANs to fool me, at least for synchronous learning, but I think it could already convincingly be done for asynchronous learning, with relatively little supervision.  I think my solution might be to respond with an artificial teacher, into which there has been copious research for some decades, and of which there are many existing examples.

To a significant extent, we already have artificial students, and artificial teachers teaching them. How ridiculous is that? How broken is the system that not only allows it but actively promotes it?

These tools are out there, getting better by the day, and it makes sense for all of us to be using them. As they become more and more ubiquitous, just as we accommodated pocket calculators in the teaching of math, so we will need to accommodate these tools in all aspects of our education. If an AI can produce a plausible new painting in any artist’s style (or essay, or book, or piece of music, or video) then what do humans need to learn, apart from how to get the most out of the machines? If an AI can write a better essay than me, why should I bother? If a machine can teach as well as me, why teach?

This is a wake-up call. Soon, if not already, most of the training data for the AIs will be generated by AIs. Unchecked, the result is going to be a set of ever-worse copies of copies, that become what the next generation consumes and learns from, in a vicious spiral that leaves us at best stagnant, at worst something akin to the Eloi in H.G. Wells’s Time Machine.  If we don’t want this to happen then it is time for educators to reclaim, to celebrate, and (perhaps a little) to reinvent our humanity. We need, more and more, to think of education as a process of learning to be, not of learning to do, except insofar as the doing contributes to our being. It’s about people, learning to be people, in the presence of and through interaction with other people. It’s about creativity, compassion, and meaning, not the achievement of outcomes a machine could replicate with ease. I think it should always have been this way.

Originally posted at: https://landing.athabascau.ca/bookmarks/view/15164121/so-this-is-a-thing

Joyful assessment: beyond high-stakes testing

Here are my slides from my presentation at the Innovate Learning Summit yesterday. It’s not world-shattering stuff – just a brutal attack on proctored, unseen written exams (PUWEs, pronounced ‘pooies’), followed by a description of the rationale, process, benefits, and unwanted consequences behind the particular portfolio-based approach to assessment employed in most of my teaching. It includes a set of constraints that I think are important to consider in any assessment process, grouped into pedagogical, motivational, and housekeeping (mainly relating to credentials) clusters. I list 13 benefits of my approach relating to each of those clusters, which I think make a pretty resounding case for using it instead of traditional assignments and tests. However, I also discuss outstanding issues, most of which relate to the external context and expectations of students or the institution, but a couple of which are fairly fundamental flaws (notably the extreme importance of prompt, caring, helpful instructor/tutor engagement in making it all work, which can be highly problematic when it doesn’t happen) that I am still struggling with.

How Assessment is Changing in The Digital Age – Five Guiding Principles | teachonline.ca

This article from teachonline.ca draws from a report by JISC (the UK academic network organization) to provide 5 ‘principles’ for assessment. I put the scare quotes around ‘principles’ because they are mostly descriptive labels for trends and they are woefully non-inclusive. There is also a subtext here – that I do understand is incredibly hard to avoid because I failed to fully do so myself in my own post last week – that assessment is primarily concerned with proving competence for the sake of credentials (it isn’t). Given these caveats, most of what is written here, however, makes some sense. Lecture with skeleton

Principle 1: authentic assessment. I completely agree that assessment should at least partly be of authentic activities. It is obvious how that plays out in applied disciplines with a clear workplace context. If you are learning how to program, for instance, then of course you should write programs that have some value in a realistic context and it goes without saying that you should assess the same. This includes aspects of the task that we might not traditionally assess in a typical programming course such as analysis, user experience testing, working with others, interacting with StackOverflow, sharing via GitHub, copying code from others, etc. It is less obvious in the case of something like, say, philosophy, or history, or latin, though, or, indeed, in any subject that is primarily found in academia. Authentic assessment for such things would probably be an essay or conference presentation, or perhaps some kind of argument, most of the time, because that’s what real life is like for most people in such fields (whether that should be the case remains an open issue). We should be wary, though, of making this the be-all and end-all, because there’s a touch of behaviourism lurking behind the idea: can the student perform as expected? There are other things that matter. For instance, I think that it is incredibly important to reflect on any learning activity, even though that might not mirror what is typically done in an authentic context. It can significantly contribute to learning but it can also reveal things that may not be obvious when we judge what is done in an authentic context, such as why people did what they did or whether they would do it the same way again. There may also be stages along the way that are not particularly authentic, but that contribute to learning the hard skills needed in order to perform effectively in the authentic context: learning a vocabulary, for example, or doing something dangerous in a cut-down, safe environment. We should probably not summatively assess such things (they should rarely contribute to a credential because they do not demonstrate applied capabilityre), but formative assessment – including of this kind of activity – is part of all learning.

Principle 2: accessible and inclusive assessment. Well, duh. Of course this should be how it is done. Not so much a principle as plain common decency. Was this not ever so? Yes it was. Only an issue when careless people forget that some media are less inclusive than others, or that not everyone knows or cares about golf. Nothing new here.

Principle 3: appropriately automated assessment. This is a reaction to bad assessment, not a principle for good assessment. There is a principle that really matters here but it is not appropriate automation: it is that assessment should enhance and improve the student experience. Automation can sometimes do that. It is appropriate for some kinds of formative feedback (see examples of non-authentic learning above)  but very little else which, in the context of this article (that implicitly focuses on the final judgment), means it is a bad idea to use it at all.

Principle 4: continuous assessment. I don’t mind this one at all. Again, the principle is not what the label claims, though. The principle here is that assessment should be designed to improve learning. For sure, if it is used as a filter to sort the great from the not great, then the filter should be authentic which, for the most part, means no high stakes, high stress, one-chance tests, and that overall behaviours and performance over time are what matters. However, there is a huge risk of therefore assessing learning in progress rather than capability once a course is done. If we are interested in assessing competence for credentials, then I’d rather do it at the end, once learning has been accomplished (ignoring the inconvenient detail that this is not a terminal state and that learning must always undergo ever-dynamic renewal and transformation until the day we die). Of course, the work done along the way will make up the bulk of the evidence for that final judgment but it allows for the fact that learning changes people, and that what we did early on in the journey seldom represents what we are able to do in the light of later learning.

Principle 5: secure assessment. Why is this mentioned in an article about assessment in the digital age? Is cheating a new invention? Was it (intentionally) insecure before? This is just a description of how some people have noticed that traditional forms of assessment are really dumb in a context that includes Wikipedia, Google, and communications devices the size of a peanut. Pointless, and certainly not a new principle for the Digital Age. In fairness, if the principles above are followed in spirit as well as in letter, it is not likely to be a huge issue but, then, why make it a principle? It’s more a report on what teachers are thinking and talking about.

The summary is motherhood and apple pie, albeit that it doesn’t entirely fall out from the principles (choice over when to be assessed, or peer assessment, for instance, are not really covered in the principles, though they are very good ideas).

I’m glad that people are sharing ideas about this but I think that there are more really important principles than these: that students should have control over their own assessment, that it should never reward or punish, that it should always support learning, and so on. I wrote a bit about this the other day, and, though that is a work in progress, I think it gets a little closer to what actually matters than this.

Originally posted at: https://landing.athabascau.ca/bookmarks/view/6531701/how-assessment-is-changing-in-the-digital-age-five-guiding-principles-teachonlineca

Evaluating assessment

Exam A group of us at AU have begun discussions about how we might transform our assessment practices, in the light of the far-reaching AU Imagine plan and principles. This is a rare and exciting opportunity to bring about radical and positive change in how learning happens at the institution. Hard technologies influence soft more than vice versa, and assessments (particularly when tied to credentials) tend to be among the hardest of all technologies in any pedagogical intervention. They are therefore a powerful lever for change. Equally, and for the same reasons, they are too often the large, slow, structural elements that infest systems to stunt progress and innovation.

Almost all learning must involve assessment, whether it be of one’s own learning, or provided by other people or machines. Even babies constantly assess their own learning. Reflection is assessment. It is completely natural and it only gets weird when we treat it as a summative judgment, especially when we add grades or credentials to the process, thus normally changing the purpose of learning from achieving competence to achieving a reward. At best it distorts learning, making it seem like a chore rather than a delight, at worst it destroys it, even (and perhaps especially) when learners successfully comply with the demands of assessors and get a good grade. Unfortunately, that’s how most educational systems are structured, so the big challenge to all teachers must be to eliminate or at least to massively reduce this deeply pernicious effect. A large number of the pedagogies that we most value are designed to solve problems that are directly caused by credentials. These pedagogies include assessment practices themselves.

With that in mind, before the group’s first meeting I compiled a list of some of the main principles that I adhere to when designing assessments, most of which are designed to reduce or eliminate the structural failings of educational systems. The meeting caused me to reflect a bit more. This is the result:

Principles applying to all assessments

  • The primary purpose of assessment is to help the learner to improve their learning. All assessment should be formative.
  • Assessment without feedback (teacher, peer, machine, self) is judgement, not assessment, pointless.
  • Ideally, feedback should be direct and immediate or, at least, as prompt as possible.
  • Feedback should only ever relate to what has been done, never the doer.
  • No criticism should ever be made without also at least outlining steps that might be taken to improve on it.
  • Grades (with some very rare minor exceptions where the grade is intrinsic to the activity, such as some gaming scenarios or, arguably, objective single-answer quizzes with T/F answers) are not feedback.
  • Assessment should never ever be used to reward or punish particular prior learning behaviours (e.g. use of exams to encourage revision, grades as goals, marks for participation, etc) .
  • Students should be able to choose how, when and on what they are assessed.
  • Where possible, students should participate in the assessment of themselves and others.
  • Assessment should help the teacher to understand the needs, interests, skills, and gaps in knowledge of their students, and should be used to help to improve teaching.
  • Assessment is a way to show learners that we care about their learning.

Specific principles for summative assessments

A secondary (and always secondary) purpose of assessment is to provide evidence for credentials. This is normally described as summative assessment, implying that it assesses a state of accomplishment when learning has ended. That is a completely ridiculous idea. Learning doesn’t end. Human learning is not in any meaningful way like programming a computer or storing stuff in a database. Knowledge and skills are active, ever-transforming, forever actively renewed, reframed, modified, and extended. They are things we do, not things we have.

With that in mind, here are my principles for assessment for credentials (none of which supersede or override any of the above core principles for assessment, which always apply):

  • There should be no assessment task that is not in itself a positive learning activity. Anything else is at best inefficient, at worst punitive/extrinsically rewarding.
  • Assessment for credentials must be fairly applied to all students.
  • Credentials should never be based on comparisons between students (norm-referenced assessment is always, unequivocally, and unredeemably wrong).
  • The criteria for achieving a credential should be clear to the learner and other interested parties (such as employers or other institutions), ideally before it happens, though this should not forestall the achievement and consideration of other valuable outcomes.
  • There is no such thing as failure, only unfinished learning. Credentials should only celebrate success, not punish current inability to succeed.
  • Students should be able to choose when they are ready to be assessed, and should be able to keep trying until they succeed.
  • Credentials should be based on evidence of competence and nothing else.
  • It should be impossible to compromise an assessment by revealing either the assessment or solutions to it.
  • There should be at least two ways to demonstrate competence, ideally more. Students should only have to prove it once (though may do so in many ways and many times, if they wish).
  • More than one person should be involved in judging competence (at least as an option, and/or on a regularly taken sample).
  • Students should have at least some say in how, when, and where they are assessed.
  • Where possible (accepting potential issues with professional accreditation, credit transfer, etc) they should have some say over the competencies that are assessed, in weighting and/or outcome.
  • Grades and marks should be avoided except where mandated elsewhere. Even then, all passes should be treated as an ‘A’ because students should be able to keep trying until they excel.
  • Great success may sometimes be worthy of an award – e.g. a distinction – but such an award should never be treated as a reward.
  • Assessment for credentials should demonstrate the ability to apply learning in an authentic context. There may be many such contexts.
  • Ideally, assessment for credentials should be decoupled from the main teaching process, because of risks of bias, the potential issues of teaching to the test (regardless of individual needs, interests and capabilities) and the dangers to motivation of the assessment crowding out the learning. However, these risks are much lower if all the above principles are taken on board.

I have most likely missed a few important issues, and there is a bit of redundancy in all this, but this is a work in progress. I think it covers the main points.

Further random reflections

There are some overriding principles and implied specifics in all of this. For instance, respect for diversity, accessibility, respect for individuals, and recognition of student control all fall out of or underpin these principles. It implies that we should recognize success, even when it is not the success we expected, so outcome harvesting makes far more sense than measurement of planned outcomes. It implies that failure should only ever be seen as unfinished learning, not as a summative judgment of terminal competence, so appreciative inquiry is far better than negative critique. It implies flexibility in all aspects of the activity. It implies, above and beyond any other purpose, that the focus should always be on learning. If assessment for credentials adversely affects learning then it should be changed at once.

In terms of implementation, while objective quizzes and their cousins can play a useful formative role in helping students to self-assess and to build confidence, machines (whether implemented by computers or rule-following humans) should normally be kept out of credentialling. There’s a place for AI but only when it augments and informs human intelligence, never when it behaves autonomously. Written exams and their ilk should be avoided, unless they conform to or do not conflict with all the above principles: I have found very few examples like this in the real world, though some practical demonstrations of competence in an authentic setting (e.g. lab work and reporting) and some reflective exercises on prior work can be effective.

A portfolio of evidence, including a reflective commentary, is usually going to be the backbone of any fair, humane, effective assessment: something that lets students highlight successes (whether planned or not), that helps them to consolidate what they have learned, and that is flexible enough to demonstrate competence shown in any number of ways. Outputs or observations of authentic activities are going to be important contributors to that. My personal preference in summative assessments is to only use the intended (including student-generated) and/or harvested outcomes for judging success, not for mandated assignments. This gives flexibility, it works for every subject, and it provides unquivocal and precise evidence of success. It’s also often good to talk with students, perhaps formally (e.g. a presentation or oral exam), in order to tease out what they really know and to give instant feedback. It is worth noting that, unlike written exams and their ilk, such methods are actually fun for all concerned, albeit that the pleasure comes from solving problems and overcoming challenges, so it is seldom easy.

Interestingly, there are occasions in traditional academia where these principles are, for the most part, already widely applied. A typical doctoral thesis/dissertation, for example, is often quite close to it (especially in more modern professional forms that put more emphasis on recording the process), as are some student projects. We know that such things are a really good idea, and lead to far richer, more persistent, more fulfilling learning for everyone. We do not do them ubiquitously for reasons of cost and time. It does take a long time to assess something like this well, and it can take more time during the rest of the teaching process thanks to the personalization (real personalization, not the teacher-imposed form popularized by learning analytics aficionados) and extra care that it implies. It is an efficient use of our time, though, because of its active contribution to learning, unlike a great many traditional assessment methods like teacher-set assignments (minimal contribution) and exams (negative contribution).  A lot of the reason for our reticence, though, is the typical university’s schedule and class timetabling, which makes everything pile on at once in an intolerable avalanche of submissions. If we really take autonomy and flexibility on board, it doesn’t have to be that way. If students submit work when it is ready to be submitted, if they are not all working in lock-step, and if it is a work of love rather than compliance, then assessment is often a positively pleasurable task and is naturally staggered. Yes, it probably costs a bit more time in the end (though there are plenty of ways to mitigate that, from peer groups to pedagogical design) but every part of it is dedicated to learning, and the results are much better for everyone.

Some useful further reading

This is a fairly random selection of sources that relate to the principles above in one way or another. I have definitely missed a lot. Sorry for any missing URLs or paywalled articles: you may be able to find downloadable online versions somewhere.

Boud, D., & Falchikov, N. (2006). Aligning assessment with long-term learning. Assessment & Evaluation in Higher Education, 31(4), 399-413. Retrieved from https://www.jhsph.edu/departments/population-family-and-reproductive-health/_docs/teaching-resources/cla-01-aligning-assessment-with-long-term-learning.pdf

Boud, D. (2007). Reframing assessment as if learning were important. Retrieved from https://www.researchgate.net/publication/305060897_Reframing_assessment_as_if_learning_were_important

Cooperrider, D. L., & Srivastva, S. (1987). Appreciative inquiry in organizational life. Research in organizational change and development, 1, 129-169.

Deci, E. L., Vallerand, R. J., Pelletier, L. G., & Ryan, R. M. (1991). Motivation and education: The self-determination perspective. Educational Psychologist, 26(3/4), 325-346.

Hussey, T., & Smith, P. (2002). The trouble with learning outcomes. Active Learning in Higher Education, 3(3), 220-233.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A’s, praise, and other bribes (Kindle ed.). Mariner Books. (this one is worth forking out money for).

Kohn, A. (2011). The case against grades. Educational Leadership, 69(3), 28-33.

Kohn, A. (2015). Four Reasons to Worry About “Personalized Learning”. Retrieved from http://www.alfiekohn.org/blogs/personalized/ (check out Alfie Kohn’s whole site for plentiful other papers and articles – consistently excellent).

Reeve, J. (2002). Self-determination theory applied to educational settings. In E. L. Deci & R. M. Ryan (Eds.), Handbook of Self-Determination research (pp. 183-203). Rochester, NY: The University of Rochester Press.

Ryan, R. M., & Deci, E. L. (2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications. (may be worth paying for if such things interest you).

Wilson-Grau, R., & Britt, H. (2012). Outcome harvesting. Cairo: Ford Foundation. http://www.managingforimpact.org/sites/default/files/resource/outome_harvesting_brief_final_2012-05-2-1.pdf.

Understanding the response to financial and non-financial incentives in education: Field experimental evidence using high-stakes assessments

What they did

This is a report by, Simon Burgess, Robert Metcalfe, and Sally Sadoff on a large scale study conducted in the UK on the effects of financial and non-financial incentives on GCSE scores (GCSEs are UK qualifications usually taken around age 16 and usually involving exams), involving over 10,000 students in 63 schools being given cash or ‘non-financial incentives’. ‘Non-financial incentives’ did not stretch as far as a pat on the back or encouragement given by caring teachers – this was about giving tickets for appealing events. The rewards were given not for getting good results but for particular behaviours the researchers felt should be useful proxies for effective study: specifically, attendance, conduct, homework, and classwork. None of the incentives were huge rewards to those already possessing plenty of creature comforts but, for poorer students, they might have seemed substantial. Effectiveness of the intervention was measured in terminal grades. The researchers were very thorough and were very careful to observe limitations and concerns. It is as close to an experimental design as you can get in a messy real-world educational intervention, with numbers that are sufficient and diverse enough to make justifiable empirical claims about the generalizability of the results.

What they found

Rewards had little effect on average marks overall, and it made little difference whether rewards were financial or not. However, in high risk groups (poor, immigrants, etc) there was a substantial improvement in GCSE results for those given rewards, compared with the control groups.

My thoughts

The only thing that does surprise me a little is that so little effect was seen overall, but I hypothesize that the reward/punishment conditions are so extreme already among GCSE students that it made little difference to add any more to the mix.  The only ones that might be affected would be those for whom the extrinsic motivation is not already strong enough. There is also a possibility that the demotivating effects for some were balanced out by the compliance effects for others: averages are incredibly dangerous things, and this study is big on averages.

What makes me sad is that there appears to be no sense of surprise or moral outrage about this basic premise in this report.

dogs being whipped, from Jack London's 'Call of the Wild' It appears reasonable at first glance: who would not want kids to be more successful in their exams? When my own kids had to do this sort of thing I would have been very keen on something that would improve their chances of success, and would be especially keen on something that appears to help to reduce systemic inequalities. But this is not about helping students to learn or improving education: this is completely and utterly about enforcing compliance and improving exam results. The fact that there might be a perceived benefit to the victims is a red herring: it’s like saying that hitting dogs harder is good for the dogs because it makes them behave better than hitting them gently. The point is that we should not be hitting them at all. It’s not just morally wrong, it doesn’t even work very well, and only continues to work at all if you keep hitting them. It teaches students that the end matters more than the process, that learning is inherently undesirable and should only done when there is a promise of a reward or threat of punishment, and that they are not in charge of it.

The inevitable result of increasing rewards (or punishments – they are functionally equivalent) is to further quench any love of learning that might be left at this point in their school careers, to reinforce harmful beliefs about how to learn, and to further put students off the subjects they might have loved under other circumstances for life.  In years to come people will look back on barbaric practices like this much as we now look back at the slave trade or pre-emancipation rights for women.

Studies like this make me feel a bit sick.

 

Address of the bookmark: http://www.efm.bris.ac.uk/economics/working_papers/pdffiles/dp16678.pdf