The Plover and the Crocodile

Did I ever tell you the story about when I was Jimmy Saville’s quality assurance manager? No?

Well, it’s not surprising really. Because, even if it were true, I would hardly be likely to boast about it. Let’s face it, the monitoring of Jimmy’s non-conformities was hardly QA’s finest hour. I guess the problem is that, when it comes to the ‘creative’ disciplines, the quality control man is the last person to be let in on the secret. The Church is another area where one might imagine that quality control should know its place. The confessional might be a good way to keep the masses in check but it didn’t do a particularly good job of stopping abuse in the sacristy.

So clearly, having a good quality management system in place isn’t the answer to all of the world’s problems. I get that. There is a limit to which an independent quality control function can deal with matters of ethics and morality, particularly where mutual scrutiny amongst expert authorities is the order of the day. So what chance would quality management have in identifying and dealing with any systemic problems that may exist within the ultimate bastion of creative expertise: scientific research and development? For example, if one feared that experimenter bias had become institutionalised within a particular field, what, if anything, could be achieved by having a quality control engineer on the case?

I ask this question because I recently had an interesting discussion on this website regarding this very matter with a certain Professor Ken Rice (aka ATTP). As it happens, the subject under the microscope was paleoclimatology, but I don’t want to make too much of that because I don’t want to get drawn into a discussion of the specifics of Climategate, hidden declines or PAGES2k. What matters is whether the generic system currently in place to ensure integrity within research (i.e. peer review and mutual scrutiny) could be improved upon by adopting practices that are commonplace outside of academia.

Here’s to having a go

I am, of course, not the first one to consider the application of quality assurance within a research environment. The idea is well enough established to have even received its own acronym: QAR. This has been defined by the University of Reading as:

“…all the techniques, systems and resources that are deployed to give assurance about the care and control with which research has been conducted.”

Accordingly, QAR addresses matters such as:

The responsibilities of those involved in the research;
transparent project planning;
the training and competence of research staff;
facilities and equipment;
documentation of procedures and methods;
research records;
and the handling of samples and materials.

The university even has a Code of Good Practice in Research, itself based upon the ‘UK Research Integrity Office’s Code of Practice for Research (UKRIO Code)’. It covers, amongst many other things, matters of honesty, integrity, cooperation, openness and research design. In particular, it states that when designing a research project one must ensure that:

“…the design of the study is appropriate for the question(s) being asked and addresses the most important potential sources of bias”

Helpfully, the University of Reading’s codes of practice add that:

“Within the University, advice on all aspects of statistical design and analysis is available from the Maths and Statistics Department.”

All good stuff. But it’s one thing to remind staff of their responsibilities and quite another to ensure that they are fulfilled. Other than taking the advice of the Maths and Statistics Department, what other ‘techniques, systems and resources’ might one bring to bear to ensure that one does not fall prey to potential sources of bias when designing a research project? Is this something that can be readily audited, for example?

One of the major difficulties here, as I see it, is the less than obvious manner in which many of the biases can sometimes operate. This is especially true when focusing upon interpretive biases, i.e. those that can compromise the assessment of research evidence. If you talk to anyone defending the climate science consensus they will tell you that interpretive bias hasn’t played any major role. Talk to anyone outside of climate science and they will tell you that it is a perennial problem within research. In fact, biases may be present even in the most rigorous of sciences and may be obvious only in retrospect. It would be nice to think that a good dose of quality assurance would solve the problem but there are a number of reasons for being pessimistic. The first is that interpretive bias can actually be made worse through the application of quality assurance.

The QAR Paradox

To explain this paradox, I need to first list the various forms that interpretive bias may take. These have been defined as follows:

Confirmation Bias – evaluating evidence that supports one’s preconceptions differently from evidence that challenges these convictions.
Rescue Bias – discounting data by finding selective faults in the experiment.
Auxiliary Hypothesis Bias – introducing ad hoc modifications to imply that an unanticipated finding would have been otherwise had the experimental conditions been different.
Mechanism Bias – being less sceptical when underlying science furnishes credibility for the data.
“Time will tell” bias – the phenomenon that different scientists need different amounts of confirmatory evidence.
Orientation Bias – the possibility that the hypothesis itself introduces prejudices and errors and becomes a determinate of experimental outcomes.

The problem here is that any quality assessment that one may introduce is likely to be a victim of the above biases rather than a panacea. Take confirmation bias, for example. Behavioral experiments have been conducted that demonstrate that evidence that seems to contradict the favoured hypothesis will receive greater quality assurance scrutiny, and will be required to pass more stringent criteria for acceptance, than evidence that confirms preconceptions. That’s how it should be, you might say, if the preconceptions are well-founded. But that is just a way of saying that there is nothing wrong with confirmation bias – when there is.

Similarly with other biases; given the presence of a bias, quality assurance is likely to be enrolled within its service rather than serve to expose it. That’s just human nature. Total objectivity is hard to achieve and that is as true within quality assurance as it is within the area of research it is intended to assure.

The expertise problem

An obvious solution to the above problem may seem to be the employment of quality assurance personnel who are skilled in the subject area but not invested in the outcome of the assessment. The parallel in industry might be, for example, a software quality assurance specialist who is fully conversant with the technicalities and processes involved in software development but who is not on the project team of software development practitioners. An analogous situation may be difficult to imagine within the context of a research establishment, but it is not impossible to see it working. Take, for example, the role that could be served by the Mathematics and Statistics Department of the University of Reading if they were to go beyond providing advice. If, instead, they were to be authorized to audit the practices of the University’s various research departments, then some degree of expert yet independent scrutiny might be brought to bear. The problem here is that it would be quite counter-cultural to expect one university department to have the authority to dictate the manner in which another department designed its research projects. Counter-cultural, perhaps, but this is the sort of imposition that industry invites every time it signs up to the scrutiny of an internal quality management system that is certified and audited by a UKAS accredited certification body. Nobody likes it, but the necessity is acknowledged.

Nobody Likes the QA Guy

Which brings me on to the next barrier against the introduction of an effective QAR function. As a former software quality assurance manager, I can speak from bitter experience of just how resentful and aggressive can be the response from a professional who has been formally criticised for any aspect of their work. Even with the most supportive of management, quality assurance staff can feel like the apocryphal plover bird picking the teeth of a Nile crocodile. And when the defects highlighted by quality assurance have their roots in the behaviour of senior management, the problem becomes even more acute. Senior management have particularly powerful jaws.

When dealing with proud and dedicated software professionals (who have a not-so-secret suspicion that quality assurance staff are just second-rate practitioners who only know how to stop the creative process), I had to be highly diplomatic in my approach. I can only guess at how difficult the role would be when dealing with someone who has dedicated a life to the advancement of a particular hypothesis and who’s whole career depended upon the establishment and maintenance of prestige amongst their peers. Furthermore, this problem is bad enough when dealing with a straightforward case of failure to follow written procedure, or the production of something that is clearly defective. But it is in the nature of experimenter or interpretive biases that accusations of such may be misinterpreted as a highly personal attack on the integrity of the individual. These are not the sort of attacks that go unpunished. For that reason, anyone operating within QAR needs to be fully sanctioned in their work and protected from the worst excesses of an academic’s umbrage. There are prominent climate scientists, I am told, who would not hesitate to sue the moment that anyone challenged their position.

So what is so wrong with peer review?

Fortunately for academia, when it comes to the big boy stuff, it has no need to defer to a supposedly second-rate jobsworth to keep the errant in line, since it has a system by which publication of results requires submission to peer review. Scientists have to publish in reputable journals that protect their reputation by ensuring that all submissions have received expert scrutiny. It’s a simple system of self-policing that embodies the scientific method. So what is wrong with that?

Well, where do I start? With my quality assurance expert’s hat on, I have to say that if I set out to design a dysfunctional process by which the quality of work may be assured, I couldn’t do a better job than peer review. It is a system in which individuals are selected to act as experts, using a selection process that does not itself adhere to any quality control precepts. It is also often the case that those appointed to critique work do so under the cloak of anonymity – not a characteristic normally associated with effective quality assurance. There appears to be nothing in place to assure the thoroughness, completeness, or impartiality of the scrutiny applied. The process appears to have no built-in safeguards to guard against partisan reviewing or, indeed, hostile reviewing borne of professional rivalry. And all of this for the publication of an article that will appear in a journal that is under editorial control. So, far from being a means of preventing interpretive bias, it seems to be a system guaranteed to nurture it into full bloom. If there had not been sufficient quality assurance applied within the relevant research establishment prior to submission of its work, I see little in peer review prior to publication that adds anything to the assurance equation.

This is not to say that peer review should be scrapped. It is, after all, the best that is on offer, and it does at least provide the opportunity for a degree of quality assurance, even if it may be difficult to judge the extent to which the opportunity has been taken.

Where we stand

It is often argued that scientific research is a special case in which standard quality assurance standards have limited applicability. They can be used to address peripheral issues, but when it comes down to creativity within an arcane subject area, there is no alternative but to place trust in the expert community’s integrity and ability to apply peer-group judgement. There again, the same argument was applied to the arts and those who held imperious positions within them, such as Mr Saville. Similar pleas for special treatment would no doubt have been used with regard to the clergy in the catholic church. There are some lines of work, it seems, where mutual scrutiny amongst peers is de rigueur, because there is no one who could gain the required abilities without becoming a member of the club. But be that as it may, there can be no doubt that it is a system that can, from time to time, lose its way.

There is, of course, nothing happening within academic research that can compare with the egregious misdemeanors of the likes of Saville and co. When all is said and done, experimenter and interpretive biases are largely innocent and subconscious failings committed in the spirit of an over-eagerness to determine the truth. Nevertheless, to the extent that truth is a rare and valuable commodity, one should not underestimate the damage that such biases may cause. And, given the insidious manner in which they operate, we should not underestimate the extent to which even the most circumspect of fields of research may be susceptible.

Several important initiatives have been taken in the arena of academic research to try to address the quality assurance challenge. I have already cited the codes of practice developed by the University of Reading, but this is just one of many establishments that has taken the subject seriously. Add to that the existence of standards and guidance such as the UKRIO Code, supplemented by more specialised directives such as ISO Guide 25, the EURACHEM/CITAC Guide CG2 and the OECD Principles of Good Laboratory Practice (GLP), together with a variety of published codes of practice for statistical analysis, and one can see that this is not a problem taken lightly. You can also see such levels of interest reflected in the British Computer Society’s recent intervention to call for better standards of software development in the production of safety-related academic code, such as the epidemiological models developed by Imperial College London (an intervention, incidentally, that was resisted with some pretty traditional negativity by the punters at Professor Rice’s website).

Nevertheless, there will always be an extent to which QAR remains little more than that intrepid little bird, perilously pecking at the teeth of the basking monster – tolerated only insofar as relatively cosmetic housekeeping is undertaken. Because, when it comes to matters of subconsciously applied or institutionalised bias, scientists are nothing if not individuals who think they can take care of it themselves. At the end of the day, despite the plover’s best efforts, the crocodile is still the crocodile.

Further Reading

A good selection of resources on best practice in statistics can be found here.