Poor fidelity may mean effective education strategies never see light of day

Promising new education interventions are potentially being ‘unnecessarily scrapped’ because trials to test their effectiveness may be insufficiently faithful to the original research, a study has warned.

The cautionary note is being raised after researchers ran a large-scale computer simulation of more than 11,000 research trials to examine how much ‘fidelity’ influenced the results. In science and the social sciences, ‘fidelity’ is the extent to which tests evaluating a new innovation adhere to the design of the original experiment on which that innovation is based.

In much the same way that scientists will test a new drug before it is approved, new strategies for improving learning are often evaluated thoroughly in schools or other settings before being rolled out.

Many innovations are rejected at this stage because the trials indicate that they result in little or no learning progress. Academics have, however, for some time voiced concerns that in some cases fidelity losses could be compromising the trial. In many cases, fidelity is not consistently measured or reported.

"There is growing concern that a substantial number of null findings in educational interventions could be due to a lack of fidelity, resulting in potentially sound programmes being unnecessarily scrapped."

The new study put this theory to the test. Researchers at the University of Cambridge and Carnegie Mellon University ran thousands of computer-modelled trials, featuring millions of simulated participants. They then examined how far changes in fidelity altered the ‘effect size’ of an intervention.

They found that even relatively subtle deviations in fidelity can have a significant impact. For every 5% of fidelity lost in the simulated follow-up tests, the effect size fell by a corresponding 5%.

In real-life contexts, this could mean that some high-potential innovations are deemed unfit for use because low fidelity is distorting the results. The study notes: “There is growing concern that a substantial number of null findings in educational interventions… could be due to a lack of fidelity, resulting in potentially sound programmes being unnecessarily scrapped.”

The findings may be particularly useful to organisations such as the Education Endowment Foundation (EEF) in the United Kingdom, or the What Works Clearinghouse in the United States, both of which evaluate new education research. The EEF reports the results of project trials on its website. At present, more than three out of five of reports indicate that the intervention being tested led to no progress, or negative progress, for pupils.

Michelle Ellefson, Professor of Cognitive Science at the Faculty of Education, University of Cambridge, said: “A lot of money is being invested in these trials, so we should look closely at how well they are controlling for fidelity. Replicability in research is hugely important, but the danger is that we could be throwing out promising interventions because of fidelity violations and creating an unnecessary trust gap between teachers and researchers.”

Academics have frequently referred to a ‘replication crisis’ precisely because the results of so many studies are difficult to reproduce. In education, trials are often carried out by a mix of teachers and researchers. Larger studies, in particular, create ample opportunities for inadvertent fidelity losses, either through human factors (such as research instructions being misread), or changes in the research environment (for example to the timing or conditions of the test).

Ellefson and Professor Daniel Oppenheimer from Carnegie Mellon University developed a computer-based randomised control trial, which, in the first instance, simulated an imaginary intervention in 40 classrooms, each with 25 students. They ran this over and over again, each time adjusting a set of variables – including the potential effect size of the intervention, the ability levels of the students, and the fidelity of the trial itself.

In subsequent models, they added additional, confounding elements which might further affect the results – for example, the quality of resources in the school, or the fact that better teachers might have higher-performing students. The study combined representative permutations of the variables they introduced, modelling 11,055 trials altogether.

"Sometimes the right response when findings cannot be replicated may not be to dismiss the research, but to step back, and ask why it might have worked in one case, but not in another?"

Strikingly, across the entire data set, the results indicated that for every 1% of fidelity lost in a trial, the effect size of the intervention also drops by 1%. This 1:1 correspondence means that even a trial with, for example, 80% fidelity, would see a significant drop in effect size, which might cast doubt on the value of the intervention being tested.

A more granular analysis then revealed that the effect of fidelity losses tended to be greater where a bigger effect size was anticipated. In other words, the most promising research innovations are also more sensitive to fidelity violations.

Although the confounding factors weakened this overall relationship, fidelity had by far the greatest impact on the effect sizes in all the tests the researchers ran.

Ellefson and Oppenheimer suggest that organisations conducting research trials may wish to establish firmer processes for ensuring, measuring and reporting fidelity so that their recommendations are as robust as possible. Their paper points to research in 2013 which found that only 29% of after-school intervention studies measured fidelity, and another study, in 2010, which found that only 15% of social work intervention studies collected fidelity data.

“"When teachers are asked to try out new teaching methods, it is natural - perhaps even admirable - for them to want to adapt the method to the needs of their specific students.,” Oppenheimer said. “To have reliable scientific tests, however, it's essential to follow the instructions precisely; otherwise researchers can't know whether the intervention, as written, will be broadly effective. It's really important for research teams to monitor and measure fidelity in studies, in order to be able to draw valid conclusions.”

Ellefson said: “Many organisations do a great job of independently evaluating research, but they need to make sure that fidelity is both measured and scrupulously checked. Sometimes the right response when findings cannot be replicated may not be to dismiss the research altogether, but to step back, and ask why it might have worked in one case, but not in another?”

The findings are published in Psychological Methods.