Evaluating assessment

Exam A group of us at AU have begun discussions about how we might transform our assessment practices, in the light of the far-reaching AU Imagine plan and principles. This is a rare and exciting opportunity to bring about radical and positive change in how learning happens at the institution. Hard technologies influence soft more than vice versa, and assessments (particularly when tied to credentials) tend to be among the hardest of all technologies in any pedagogical intervention. They are therefore a powerful lever for change. Equally, and for the same reasons, they are too often the large, slow, structural elements that infest systems to stunt progress and innovation.

Almost all learning must involve assessment, whether it be of one’s own learning, or provided by other people or machines. Even babies constantly assess their own learning. Reflection is assessment. It is completely natural and it only gets weird when we treat it as a summative judgment, especially when we add grades or credentials to the process, thus normally changing the purpose of learning from achieving competence to achieving a reward. At best it distorts learning, making it seem like a chore rather than a delight, at worst it destroys it, even (and perhaps especially) when learners successfully comply with the demands of assessors and get a good grade. Unfortunately, that’s how most educational systems are structured, so the big challenge to all teachers must be to eliminate or at least to massively reduce this deeply pernicious effect. A large number of the pedagogies that we most value are designed to solve problems that are directly caused by credentials. These pedagogies include assessment practices themselves.

With that in mind, before the group’s first meeting I compiled a list of some of the main principles that I adhere to when designing assessments, most of which are designed to reduce or eliminate the structural failings of educational systems. The meeting caused me to reflect a bit more. This is the result:

Principles applying to all assessments

  • The primary purpose of assessment is to help the learner to improve their learning. All assessment should be formative.
  • Assessment without feedback (teacher, peer, machine, self) is judgement, not assessment, pointless.
  • Ideally, feedback should be direct and immediate or, at least, as prompt as possible.
  • Feedback should only ever relate to what has been done, never the doer.
  • No criticism should ever be made without also at least outlining steps that might be taken to improve on it.
  • Grades (with some very rare minor exceptions where the grade is intrinsic to the activity, such as some gaming scenarios or, arguably, objective single-answer quizzes with T/F answers) are not feedback.
  • Assessment should never ever be used to reward or punish particular prior learning behaviours (e.g. use of exams to encourage revision, grades as goals, marks for participation, etc) .
  • Students should be able to choose how, when and on what they are assessed.
  • Where possible, students should participate in the assessment of themselves and others.
  • Assessment should help the teacher to understand the needs, interests, skills, and gaps in knowledge of their students, and should be used to help to improve teaching.
  • Assessment is a way to show learners that we care about their learning.

Specific principles for summative assessments

A secondary (and always secondary) purpose of assessment is to provide evidence for credentials. This is normally described as summative assessment, implying that it assesses a state of accomplishment when learning has ended. That is a completely ridiculous idea. Learning doesn’t end. Human learning is not in any meaningful way like programming a computer or storing stuff in a database. Knowledge and skills are active, ever-transforming, forever actively renewed, reframed, modified, and extended. They are things we do, not things we have.

With that in mind, here are my principles for assessment for credentials (none of which supersede or override any of the above core principles for assessment, which always apply):

  • There should be no assessment task that is not in itself a positive learning activity. Anything else is at best inefficient, at worst punitive/extrinsically rewarding.
  • Assessment for credentials must be fairly applied to all students.
  • Credentials should never be based on comparisons between students (norm-referenced assessment is always, unequivocally, and unredeemably wrong).
  • The criteria for achieving a credential should be clear to the learner and other interested parties (such as employers or other institutions), ideally before it happens, though this should not forestall the achievement and consideration of other valuable outcomes.
  • There is no such thing as failure, only unfinished learning. Credentials should only celebrate success, not punish current inability to succeed.
  • Students should be able to choose when they are ready to be assessed, and should be able to keep trying until they succeed.
  • Credentials should be based on evidence of competence and nothing else.
  • It should be impossible to compromise an assessment by revealing either the assessment or solutions to it.
  • There should be at least two ways to demonstrate competence, ideally more. Students should only have to prove it once (though may do so in many ways and many times, if they wish).
  • More than one person should be involved in judging competence (at least as an option, and/or on a regularly taken sample).
  • Students should have at least some say in how, when, and where they are assessed.
  • Where possible (accepting potential issues with professional accreditation, credit transfer, etc) they should have some say over the competencies that are assessed, in weighting and/or outcome.
  • Grades and marks should be avoided except where mandated elsewhere. Even then, all passes should be treated as an ‘A’ because students should be able to keep trying until they excel.
  • Great success may sometimes be worthy of an award – e.g. a distinction – but such an award should never be treated as a reward.
  • Assessment for credentials should demonstrate the ability to apply learning in an authentic context. There may be many such contexts.
  • Ideally, assessment for credentials should be decoupled from the main teaching process, because of risks of bias, the potential issues of teaching to the test (regardless of individual needs, interests and capabilities) and the dangers to motivation of the assessment crowding out the learning. However, these risks are much lower if all the above principles are taken on board.

I have most likely missed a few important issues, and there is a bit of redundancy in all this, but this is a work in progress. I think it covers the main points.

Further random reflections

There are some overriding principles and implied specifics in all of this. For instance, respect for diversity, accessibility, respect for individuals, and recognition of student control all fall out of or underpin these principles. It implies that we should recognize success, even when it is not the success we expected, so outcome harvesting makes far more sense than measurement of planned outcomes. It implies that failure should only ever be seen as unfinished learning, not as a summative judgment of terminal competence, so appreciative inquiry is far better than negative critique. It implies flexibility in all aspects of the activity. It implies, above and beyond any other purpose, that the focus should always be on learning. If assessment for credentials adversely affects learning then it should be changed at once.

In terms of implementation, while objective quizzes and their cousins can play a useful formative role in helping students to self-assess and to build confidence, machines (whether implemented by computers or rule-following humans) should normally be kept out of credentialling. There’s a place for AI but only when it augments and informs human intelligence, never when it behaves autonomously. Written exams and their ilk should be avoided, unless they conform to or do not conflict with all the above principles: I have found very few examples like this in the real world, though some practical demonstrations of competence in an authentic setting (e.g. lab work and reporting) and some reflective exercises on prior work can be effective.

A portfolio of evidence, including a reflective commentary, is usually going to be the backbone of any fair, humane, effective assessment: something that lets students highlight successes (whether planned or not), that helps them to consolidate what they have learned, and that is flexible enough to demonstrate competence shown in any number of ways. Outputs or observations of authentic activities are going to be important contributors to that. My personal preference in summative assessments is to only use the intended (including student-generated) and/or harvested outcomes for judging success, not for mandated assignments. This gives flexibility, it works for every subject, and it provides unquivocal and precise evidence of success. It’s also often good to talk with students, perhaps formally (e.g. a presentation or oral exam), in order to tease out what they really know and to give instant feedback. It is worth noting that, unlike written exams and their ilk, such methods are actually fun for all concerned, albeit that the pleasure comes from solving problems and overcoming challenges, so it is seldom easy.

Interestingly, there are occasions in traditional academia where these principles are, for the most part, already widely applied. A typical doctoral thesis/dissertation, for example, is often quite close to it (especially in more modern professional forms that put more emphasis on recording the process), as are some student projects. We know that such things are a really good idea, and lead to far richer, more persistent, more fulfilling learning for everyone. We do not do them ubiquitously for reasons of cost and time. It does take a long time to assess something like this well, and it can take more time during the rest of the teaching process thanks to the personalization (real personalization, not the teacher-imposed form popularized by learning analytics aficionados) and extra care that it implies. It is an efficient use of our time, though, because of its active contribution to learning, unlike a great many traditional assessment methods like teacher-set assignments (minimal contribution) and exams (negative contribution).  A lot of the reason for our reticence, though, is the typical university’s schedule and class timetabling, which makes everything pile on at once in an intolerable avalanche of submissions. If we really take autonomy and flexibility on board, it doesn’t have to be that way. If students submit work when it is ready to be submitted, if they are not all working in lock-step, and if it is a work of love rather than compliance, then assessment is often a positively pleasurable task and is naturally staggered. Yes, it probably costs a bit more time in the end (though there are plenty of ways to mitigate that, from peer groups to pedagogical design) but every part of it is dedicated to learning, and the results are much better for everyone.

Some useful further reading

This is a fairly random selection of sources that relate to the principles above in one way or another. I have definitely missed a lot. Sorry for any missing URLs or paywalled articles: you may be able to find downloadable online versions somewhere.

Boud, D., & Falchikov, N. (2006). Aligning assessment with long-term learning. Assessment & Evaluation in Higher Education, 31(4), 399-413. Retrieved from https://www.jhsph.edu/departments/population-family-and-reproductive-health/_docs/teaching-resources/cla-01-aligning-assessment-with-long-term-learning.pdf

Boud, D. (2007). Reframing assessment as if learning were important. Retrieved from https://www.researchgate.net/publication/305060897_Reframing_assessment_as_if_learning_were_important

Cooperrider, D. L., & Srivastva, S. (1987). Appreciative inquiry in organizational life. Research in organizational change and development, 1, 129-169.

Deci, E. L., Vallerand, R. J., Pelletier, L. G., & Ryan, R. M. (1991). Motivation and education: The self-determination perspective. Educational Psychologist, 26(3/4), 325-346.

Hussey, T., & Smith, P. (2002). The trouble with learning outcomes. Active Learning in Higher Education, 3(3), 220-233.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A’s, praise, and other bribes (Kindle ed.). Mariner Books. (this one is worth forking out money for).

Kohn, A. (2011). The case against grades. Educational Leadership, 69(3), 28-33.

Kohn, A. (2015). Four Reasons to Worry About “Personalized Learning”. Retrieved from http://www.alfiekohn.org/blogs/personalized/ (check out Alfie Kohn’s whole site for plentiful other papers and articles – consistently excellent).

Reeve, J. (2002). Self-determination theory applied to educational settings. In E. L. Deci & R. M. Ryan (Eds.), Handbook of Self-Determination research (pp. 183-203). Rochester, NY: The University of Rochester Press.

Ryan, R. M., & Deci, E. L. (2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications. (may be worth paying for if such things interest you).

Wilson-Grau, R., & Britt, H. (2012). Outcome harvesting. Cairo: Ford Foundation. http://www.managingforimpact.org/sites/default/files/resource/outome_harvesting_brief_final_2012-05-2-1.pdf.

Our educational assessment systems are designed to create losers

The always wonderful Alfie Kohn describes an airline survey that sought to find out how it compared with others, which he chose not to answer because the airline was thus signalling no interest in providing the best quality experience possible, just aiming to do enough to beat the competition. The thrust of his article is that much the same is true of standardized tests in schools. As Kohn rightly observes, the central purpose of testing as it tends to be used in schools and beyond is not to evaluate successful learning but to compare students (or teachers, or institutions, or regions) with one another in order to identify winners and losers.

‘When you think about it, all standardized tests — not just those that are norm-referenced — are based on this compulsion to compare. If we were interested in educational excellence, we could use authentic forms of assessment that are based on students’ performance at a variety of classroom projects over time. The only reason to standardize the process, to give all kids the same questions under the same conditions on a contrived, one-shot, high-stakes test, is if what we wanted to know wasn’t “How well are they learning?” but “Who’s beating whom?”

It’s a good point, but I think it is not just an issue with standardized tests. The problem occurs with all the summative assessments (the judgments) we use. Our educational assessment systems are designed to create losers as much as they a made to find winners. Whether they follow the heinous practice of norm-referencing or not, they are sorting machines, built to discover competent people, and to discard the incompetent. In fact, as Kohn notes, when there are too many winners we are accused of grade inflation or a dropping of standards.

Wrong Way sign This makes no sense if you believe, as I do, that the purpose of education is to educate. In a system that demands grading, unless 100% of students that want to succeed get the best possible grades, then we have failed to meet the grade ourselves. The problem, though, is not so much the judgments themselves as it is the intimate, inextricable binding of judgmental with learning processes. Given enough time, effort, and effective teaching, almost anyone can achieve pretty much any skill or competence, as long as they stick at it. We have very deliberately built a system that does not aim for that at all. Instead, it aims to sort wheat from chaff. That’s not why I do the job I do, and I hope it is not why you do it either, but that’s exactly what the system is made to do. And yet we (at least I) think of ourselves as educators, not judges. These two roles are utterly separate and inconsolably inconsistent.

Who needs 100%?

It might be argued that some students don’t actually want to get the best possible grades. True. And sure, we don’t always want or need to learn everything we could learn. If I am learning how to use a new device or musical instrument I sometimes read/watch enough to get me started and do not go any further, or skim through to get the general gist. Going for a less-than-perfect understanding is absolutely fine if that’s all you need right now. But that’s not quite how it works in formal education, in part because we punish those that make such choices (by giving lower grades) and in part because we systematically force students to learn stuff they neither want nor need to learn, at a time that we choose, using the lure of the big prizes at the end to coax them. Even those that actually do want or need to learn a topic must stick with it to the bitter end regardless of whether it is useful to do the whole thing, regardless of whether they need more or less of it, regardless of whether it is the right time to learn it, regardless of whether it is the right way for them to learn it. They must do all that we say they must do, or we won’t give them the gold star. That’s not even a good way to train a dog.

It gets worse. At least dogs normally get a second chance. Having set the bar, we normally give just a single chance at winning or, at best, an option to be re-tested (often at a price and usually only once), rather than doing the human thing of allowing people to take the time they need and learn from their mistakes until they get as good as they want or need to get. We could learn a thing or two from computer games –  the ability to repeat over and over, achieving small wins all along the way without huge penalties for losing, is a powerful way to gain competence and sustain motivation. It is better if students have some control over the pacing but, even at Athabasca, an aggressively open university that does its best to give everyone all the opportunity they need to succeed, where self-paced learners can choose the point at which they are ready to take the assessments, we still have strict cut-offs for contract periods and, like all the rest, we still tend to allow just a single stab at each assessment. In most of my own self-paced courses (and in some others) we try to soften that by allowing students to iterate without penalty until the end but, when that end comes, that’s still it. This is not for the benefit of the students: this is for our convenience. Yes, there is a cost to giving greater freedom – it takes time, effort, and compassion – but that’s a business problem to solve, not an insuperable barrier. WGU’s subscription model, for instance, in which students pay for an all-you-can-eat smorgasbord, appears to work pretty well.

Meta lessons

It might be argued that there are other important lessons that we teach when we competitively grade. Some might suggest that competition is a good thing to learn in and of itself, because it is one of the things that drives society and everyone has to do it at least sometimes. Sure, but cooperation and mutual support is usually better, or at least an essential counterpart, so embedding competition as the one and only modality seems a bit limiting. And, if we are serious about teaching people about how to compete, then that is what we should do, and not actively put them in jeopardy to achieve that: as Jerome Bruner succinctly put it, ‘Learning something with the aid of an instructor should, if instruction is effective, be less dangerous or risky or painful than learning on one’s own’ (Bruner 1966, p.44).

Others might claim that sticking with something you don’t like doing is a necessary lesson if people are to play a suitably humble/productive role in society. Such lessons have a place, I kind-of agree. Just not a central place, just not a pervasive place that underpins or, worse, displaces everything else. Yes, grit can be really useful, if you are pursuing your goals or helping others to reach theirs. By all means, let’s teach that, let’s nurture that, and by all means let’s do what we can to help students see how learning something we are teaching can help them to reach their goals, even though it might be difficult or unpleasant right now. But there’s a big difference between doing something for self or others, and subservient compliance with someone else’s demands. ‘Grit’ does not have to be synonymous with ‘taking orders’. Doing something distasteful because we feel we must, because it aligns with our sense of self-worth, because it will help those we care about, because it will lead us where we want to be, is all good. Doing something because someone else is making us do it (with the threat/reward of grades) might turn us into good soldiers, might generate a subservient workforce in a factory or coal face, might keep an unruly subjugated populace in check, but it’s not the kind of attitude that is going to be helpful if we want to nurture creative, caring, useful members of 21st Century society.

Societal roles

It might be argued that accreditation serves a powerful societal function, ranking and categorizing people in ways that (at least for the winners and for consumers of graduates) have some value. It’s a broken and heartless system, but our societies do tend to be organized around it and it would be quite disruptive if we got rid of it without finding some replacement. Without it, employers might actually need to look at evidence of what people have done, for instance, rather than speedily weeding out those with insufficient grades. Moreover, circularly enough, most of our students currently want and expect it because it’s how things are done in our culture. Even I, a critic of the system, proudly wear the label ‘Doctor’, because it confers status and signals particular kinds of achievement, and there is no doubt that it and other qualifications have been really quite useful in my career. If that were all accreditation did then I could quite happily live with it, even though the fact that I spent a few years researching something interesting about 15 years ago probably has relatively little bearing on what I do or can do now.  The problem is not accreditation in itself, but that it is inextricably bound to the learning process. Under such conditions, educational assessment systems are positively harmful to learning. They are anti-educative. Of necessity, due to the fact that they tend to determine precisely what students should do and how they should do it, they sap intrinsic motivation and undermine love of learning. Even the staunchest of defenders of tightly integrated learning and judgment would presumably accept that learning is at least as important as grading so, if grading undermines learning (and it quite unequivocally does), something is badly broken.

A simple solution?

It does not have to be this way. I’ve said it before but it bears repeating: at least a large part of the solution is to decouple learning and accreditation altogether. There is a need for some means to indicate prowess, sure. But the crude certificates we currently use may not be the best way to do that in all cases, and it doesn’t have to dominate the learning process to the point of killing love of learning. If we could drop the accreditation role during the teaching process we could focus much more on providing useful feedback, on valorizing failures as useful steps towards success, on making interesting diversions, on tailoring the learning experience to the learner’s interests and capabilities rather than to credential requirements, on providing learning experiences that are long enough and detailed enough for the students’ needs, rather than a uniform set of fixed lengths to suit our bureaucracies.

Equally, we could improve our ability to provide credentials. For those that need it, we could still offer plenty of accreditation opportunities, for example through a portfolio-based approach and/or collecting records of learning or badges along the way. We could even allow for some kind of testing like oral, written, or practical exams for those that must, where it is appropriate to the competence (not, as now, as a matter of course) and we could actually do it right, rather than in ways that positively enable and reward cheating. None of this has to bound to specific courses. This decoupling would also give students the freedom to choose other ways of learning apart from our own courses, which would be quite a strong incentive for us to concentrate on teaching well. It might challenge us to come up with authentic forms of assessment that allow students to demonstrate competence through practice, or to use evidence from multiple sources, or to show their particular and unique skillset. It would almost certainly let us do both accreditation and teaching better. And it’s not as though we have no models to work from: from driving tests to diving tests to uses of portfolios in job interviews, there are plenty of examples of ways this can work already.

Apart from some increased complexities of managing such a system (which is where online tools can come in handy and where opportunities exist for online institutions that conventional face-to-face institutions cannot compete with) this is not a million miles removed from what we do now: it doesn’t require a revolution, just a simple shift in emphasis, and a separation of two unnecessarily and mutually inconsistent intertwined roles. Especially when processes and tools already exist for that, as they do at Athabasca University, it would not even be particularly costly. Inertia would be a bigger problem than anything else, but even big ships can eventually be steered in other directions. We just have to choose to make it so.

 

Reference

Bruner, J. S. (1966). Toward a Theory of Instruction. Cambridge MA: The Belknap Press of Harvard University Press.

On learning styles

This post by James Atherton makes the case that, whether or not it is possible to identify distinctive learning styles or preferences, they are largely irrelevant to teaching, and are potentially even antagonistic to effective learning. Regular readers, colleagues and friends will know that this conforms well with my own analysis of learning styles literature. The notion that learning styles should determine teaching styles is utter stuff and nonsense based on a very fuzzy understanding of the relationship between teaching and learning, and a desperate urge to find a theory to make the process seem more ‘scientific’, with no believable empirical foundation whatsoever. This doesn’t make the use of learning styles pointless, however.

Teaching is a design discipline much more than it is a science. One of the biggest challenges of teaching is making it work for as many students as possible, which means thinking carefully about different needs, interests, skills, concerns and contexts. So, if learning styles theories can help you to think about different learner needs more clearly when designing a learning path then that can be a good thing.

The trouble is, thinking about personality patterns associated with learners’ astrological star signs or Chinese horoscope animals would probably work just as well. A comparative study would be a fun to do and, I think, the methodological issues would reveal a lot about how and why existing research has signally failed to find any plausible link.

There are alternatives. In the field of web design we often use personas – fictional but well fleshed-out representative individuals – in order to try to empathize with the users of our sites and to help us to see our designs through different eyes. See https://www.interaction-design.org/encyclopedia/personas.html for a thorough introduction to the area. I use these in my learning design process and find them very useful. Thinking ‘how would John Smith react to this?’ makes much more sense to me than thinking ‘would this appeal to kinaesthetic learners?’, especially as I can imagine how John Smith might change his ways of thinking as a course progresses, how different life events might affect him, and how he might interact with his peers.

Address of the bookmark: http://www.learningandteaching.info/learning/learning_styles.htm