It’s all solid stuff that supports much of what I and many others have written about the value of belongingness and social interaction in learning but, like much research in fields such as psychology, education, sociology, and so on, it makes some seemingly innocuous but fundamentally wrong assertions of fact. For instance:
“Those who were instructed to strike up a conversation with someone new on public transport or with their cab driver reported a more positive commute experience than those instructed to sit in silence.”
What, all of them? That seems either unbelievably improbable, or the result of a flawed methodology, or a sign of way too small a sample size. The paper itself is inaccessibly paywalled so I don’t know for sure, but I suspect this is actually just a sloppy description of the findings. It is not the result of bad reporting in the Quartz article, though: it is precisely what the abstract of the paper itself actually claims. The researchers make several similar claims like “Those who were instructed to strike up a hypothetical conversation with a stranger said they expected a negative experience as opposed to just sitting alone.” Again – all of them? If that were true, no one would ever talk to strangers (which anyone that has ever stood in a line-up in Canada knows to be not just false but Trumpishly false), so this is either a very atypical group or a very misleading statement about group members’ behaviours. The findings are likely, on average, correct for the groups studied, but that’s not the way it is written.
The article is filled with similarly dubious quotes from distinguished researchers and, worse, pronouncements about what we should do as a result. Often the error is subtly couched in (accurate but misleadingly ambiguous) phrasing like “The group that engaged in friendly small talk performed better in the tests.” I don’t think it is odd to carelessly read that as ‘all of the individuals in the group performed better than all of those in the other groups’, rather than that, ‘on average, the collective group entity performed better than another collective group entity’, which is what was actually meant (and that is far less interesting). From there it is an easy – but dangerously wrong – step to claim that ‘if you engage in small talk then you will experience cognitive gains.’ It’s natural to want to extrapolate a general law from averaged behaviours, and in some domains (where experimental anomalies can be compellingly explained) it makes sense, but it’s wrong in most cases, especially when applied to complex systems like, say, anything involving the behaviour of people.
It’s a problem because, like most in my profession, I regularly use such findings to guide my own teaching. On average, results are likely (but far from certain) to be better than if I did not use them, but definitely not for everyone, and certainly not every time. Students do tend to benefit from engagement with other students, sure. It’s a fair heuristic, but there are exceptions, at least sometimes. And the exceptions aren’t just a statistical anomaly. These are real people we are talking about, not average people. When I do teaching well – nothing like enough of the time – I try to make it possible for those that aren’t average to do their own thing without penalty. I try to be aware of differences and cater for them. I try to enable those that wish it to personalize their own learning. I do this because I’ve never in my entire life knowingly met an average person.
Unfortunately, our educational systems really don’t help me in my mission because they are pretty much geared to cater for someone that probably doesn’t exist. That said, the good news is that there is a general trend towards personalized learning that figures largely in most institutional plans. The bad news is that (as Alfie Kohn brilliantly observes) what is normally meant by ‘personalized’ in such plans is not its traditional definition at all, but instead ‘learning that is customized (normally by machines) for students in order that they should more effectively meet our requirements.’ In case we might have forgotten, personalization is something done by people, not to people.
Further reading: Todd Rose’s ‘End of Average‘ is a great primer on how to avoid the average-to-the-particular trap and many other errors, including why learning styles, personality types, and a lot of other things many people believe to be true are utterly ungrounded, along with some really interesting discussion of how to improve our educational systems (amongst other things). I was gripped from start to finish and keep referring back to it a year or two on.
A lot of progress has been made in medicine in recent years through the application of cocktails of drugs. Those used to combat AIDS are perhaps the most well-known, but there are many other applications of the technique to everything from lung cancer to Hodgkin’s lymphoma. The logic is simple. Different drugs attack different vulnerabilities in the pathogens etc they seek to kill. Though evolution means that some bacteria, viruses or cancers are likely to be adapted to escape one attack, the more different attacks you make, the less likely it will be that any will survive.
Unfortunately, combinatorial complexity means this is not a simply a question of throwing a bunch of the best drugs of each type together and gaining their benefits additively. I have recently been reading John H. Miller’s ‘A crude look at the whole: the science of complex systems in business, life and society‘ which is, so far, excellent, and that addresses this and many other problems in complexity science. Miller uses the nice analogy of fashion to help explain the problem: if you simply choose the most fashionable belt, the trendiest shoes, the latest greatest shirt, the snappiest hat, etc, the chances of walking out with the most fashionable outfit by combining them together are virtually zero. In fact, there’s a very strong chance that you will wind up looking pretty awful. It is not easily susceptible to reductive science because the variables all affect one another deeply. If your shirt doesn’t go with your shoes, it doesn’t matter how good either are separately. The same is true of drugs. You can’t simply pick those that are best on their own without understanding how they all work together. Not only may they not additively combine, they may often have highly negative effects, or may prevent one another being effective, or may behave differently in a different sequence, or in different relative concentrations. To make matters worse, side effects multiply as well as therapeutic benefits so, at the very least, you want to aim for the smallest number of compounds in the cocktail that you can get away with. Even were the effects of combining drugs positive, it would be premature to believe that it is the best possible solution unless you have actually tried them all. And therein lies the rub, because there are really a great many ways to combine them.
Miller and colleagues have been using the ideas behind simulated annealing to create faster, better ways to discover working cocktails of drugs. They started with 19 drugs which, a small bit of math shows, could be combined in 2 to the power of 19 different ways – about half a million possible combinations (not counting sequencing or relative strength issues). As only 20 such combinations could be tested each week, the chances of finding an effective, let alone the best combination, were slim within any reasonable timeframe. Simplifying a bit, rather than attempting to cover the entire range of possibilities, their approach finds a local optimum within one locale by picking a point and iterating variations from there until the best combination is found for that patch of the fitness landscape. It then checks another locale and repeats the process, and iterates until they have covered a large enough portion of the fitness landscape to be confident of having found at least a good solution: they have at least several peaks to compare. This also lets them follow up on hunches and to use educated guesses to speed up the search. It seems pretty effective, at least when compared with alternatives that attempt a theory-driven intentional design (too many non-independent variables), and is certainly vastly superior to methodically trying every alternative, inasmuch as it is actually possible to do this within acceptable timescales.
The central trick is to deliberately go downhill on the fitness landscape, rather than following an uphill route of continuous improvement all the time, which may simply get you to the top of an anthill rather than the peak of Everest in the fitness landscape. Miller very effectively shows that this is the fundamental error committed by followers of the Six-Sigma approach to management, an iterative method of process improvement originally invented to reduce errors in the manufacturing process: it may work well in a manufacturing context with a small number of variables to play with in a fixed and well-known landscape, but it is much worse than useless when applied in a creative industry like, say, education, because the chances that we are climbing a mountain and not an anthill are slim to negligible. In fact, the same is true even in manufacturing: if you are just making something inherently weak as good as it can be, it is still weak. There are lessons here for those that work hard to make our educational systems work better. For instance, attempts to make examination processes more reliable are doomed to fail because it’s exams that are the problem, not the processes used to run them. As I finish this while listening to a talk on learning analytics, I see dozens of such examples: most of the analytics tools described are designed to make the various parts of the educational machine work ‘ better’, ie. (for the most part) to help ensure that students’ behaviour complies with teachers’ intent. Of course, the only reason such compliance was ever needed was for efficient use of teaching resources, not because it is good for learning. Anthills.
This way of thinking seems to me to have potentially interesting applications in educational research. We who work in the area are faced with an irreducibly large number of recombinable and mutually affective variables that make any ethical attempt to do experimental research on effectiveness (however we choose to measure that – so many anthills here) impossible. It doesn’t stop a lot of people doing it, and telling us about p-values that prove their point in more or less scupulous studies, but they are – not to put too fine a point on it – almost always completely pointless. At best, they might be telling us something useful about a single, non-replicable anthill, from which we might draw a lesson or two for our own context. But even a single omitted word in a lecture, a small change in inflection, let alone an impossibly vast range of design, contextual, historical and human factors, can have a substantial effect on learning outcomes and effectiveness for any given individual at any given time. We are always dealing with a lot more than 2 to the power of 19 possible mutually interacting combinations in real educational contexts. For even the simplest of research designs in a realistic educational context, the number of possible combinations of relevant variables is more likely closer to 2 to the power of 100 (in base 10 that’s 1,267,650,600,228,229,401,496,703,205,376). To make matters worse, the effects we are looking for may sometimes not be apparent for decades (having recombined and interacted with countless others along the way) and, for anything beyond trivial reductive experiments that would tell us nothing really useful, could seldom be done at a rate of more than a handful per semester, let alone 20 per week. This is a very good reason to do a lot more qualitative research, seeking meanings, connections, values and stories rather than trying to prove our approaches using experimental results. Education is more comparable to psychology than medicine and suffers the same central problem, that the general does not transfer to the specific, as well as a whole bunch of related problems that Smedslund recently coherently summarized. The article is paywalled, but Smedlund’s abstract states his main points succinctly:
“The current empirical paradigm for psychological research is criticized because it ignores the irreversibility of psychological processes, the infinite number of influential factors, the pseudo-empirical nature of many hypotheses, and the methodological implications of social interactivity. An additional point is that the differences and correlations usually found are much too small to be useful in psychological practice and in daily life. Together, these criticisms imply that an objective, accumulative, empirical and theoretical science of psychology is an impossible project.”
You could simply substitute ‘education’ for ‘psychology’ in this, and it would read the same. But it gets worse, because education is as much about technology and design as it is about states of mind and behaviour, so it is orders of magnitude more complex than psychology. The potential for invention of new ways of teaching and new states of learning is essentially infinite. Reductive science thus has a very limited role in educational research, at least as it has hitherto been done.
But what if we took the lessons of simulated annealing to heart? I recently bookmarked an approach to more reliable research suggested by the Christensen Institute that might provide a relevant methodology. The idea behind this is (again, simplifying a bit) to do the experimental stuff, then to sweep the normal results to one side and concentrate on the outliers, performing iterations of conjectures and experiments on an ever more diverse and precise range of samples until a richer, fuller picture results. Although it would be painstaking and longwinded, it is a good idea. But one cycle of this is a bit like a single iteration of Miller’s simulated annealing approach, a means to reach the top of one peak in the fitness landscape, that may still be a low-lying peak. However if, having done that, we jumbled up the variables again and repeated it starting in a different place, we might stand a chance of climbing some higher anthills and, perhaps, over time we might even hit a mountain and begin to have something that looks like a true science of education, in which we might make some reasonable predictions that do not rely on vague generalizations. It would either take a terribly long time (which itself might preclude it because, by the time we had finished researching, the discipline will have moved somewhere else) or would hit some notable ethical boundaries (you can’t deliberately mis-teach someone), but it seems more plausible than most existing techniques, if a reductive science of education is what we seek.
To be frank, I am not convinced it is worth the trouble. It seems to me that education is far closer as a discipline to art and design than it is to psychology, let alone to physics. Sure, there is a lot of important and useful stuff to be learned about how we learn: no doubt about that at all, and a simulated annealing approach might speed up that kind of research. Painters need to know what paints do too. But from there to prescribing how we should therefore teach spans a big chasm that reductive science cannot, in principle or practice, cross. This doesn’t mean that we cannot know anything: it just means it’s a different kind of knowledge than reductive science can provide. We are dealing with emergent phenomena in complex systems that are ontologically and epistemologically different from the parts of which they consist. So, yes, knowledge of the parts is valuable, but we can no more predict how best to teach or learn from those parts than we can predict the shape and function of the heart from knowledge of cellular organelles in its constituent cells. But knowledge of the cocktails that result – that might be useful.