For my birthday, I want to go to a seminar about the better
use of statistics in animal experiments, said no 9 year old, ever. However, add
30 years and there I was. So why, instead of going to the zoo, was I at a stats
seminar?
The fault in our p’s
There is increasingly recognised to be a problem with
reproducibility in
all areas of biological research. This is a particular issue when it comes
to studies using animals. Poor reproducibility has contributed to a failure to
turn pre-clinical discovery studies into successful medicines. There are a
number of causes of this problem, including unreliable research reagents,
confirmation bias and the flawed use of statistics. A lot of which can be
overcome with better experimental design, in particular better stats. As best I
understand it (and am not a statistician) using p<0.05 means that 1 in 20
studies is ‘significant’ by chance and therefore we see results as positive
when they are not. Due to some complicated statistical sleight of hand
involving alpha, beta and normal distributions, it could be the case that as
much as 60% of significant results are in fact untrue (though it would be
better to follow some actual statisticians to check how this works).
Times are a changin’
But why should you care? It is/ ought to be a given that we
should all do better science. However career pressures – the need to publish
positive results in glossy journals may lead to practices that are not in line
with best scientific practice. This can lead to a number of tricks that are
used consciously or unconsciously to make a more positive story – p-Hacking,
HARKing (Hypothesis after result known) etc. However publishing is not the only
career pressure, we all need to bring in grant funding. Both the UK and US funders
are using this as a tool to change research practice. The funders have updated their
guidance (The MRC
and NC3Rs guides are here) to ensure more rigour in experimental design in grant applications.
Je-S Kidding
In 2012, an assessment by review board members about the
quality of the justification for animals found that the reason for animal usage
and selection of species was ok, but the statistical justification was either
absent or plain wrong. It used to be that grants could be awarded
conditionally, subject to amendment if there were issues with the statistics. As of next year, grants can be rejected if the
quality of the justification of animal usage is not good enough, without the
opportunity to amend the application post award. Let me repeat that, because we
are all inclined to spend 90+% of effort in perfecting the case for support and
then fill the rest of the form out in a mad dash. Grants may be REJECTED,
without a chance to amend post-award, if the case for animals is poor. Stating
‘We did this because we always have’, won’t cut it anymore.
Get it right
What follows are some steps that can help you improve the
stats part of the justification for animal usage. Obviously this will not
automatically get you funding – if it did, I’d be unlikely to share it, would
I. But hopefully it will help you frame your application in a more clear way. NB
don’t spend all your word count on the stats, you still need a case for animal
usage/ species etc.
Step 1 – PICO. Put a single sentence at the beginning of the
animal justification explaining in brief what you are aiming to do. Try using
PICO: P (population) – to who, I (intervention) – what will you do, C (control)
– what will you compare against, O (outcome). Consider your unit of measurement,
e.g. 100 sections from a single liver is still n=1.
Step 2 – describe the effect size. What is the biological
effect you are looking for, ideally in a human, this should be drawn from your
own experience of the disease area. One of the speakers made the very good
point that a positive outcome in animal studies is seldom the actual endpoint
we are interested in, i.e. human disease. Think about what a real world
biologically significant effect would be and justify why that would be
important in a patient. This should be informed by your expertise in the area. An
example using blood pressure – 160 mm is bad, 120 mm is good, the ‘effect size’
of a blockbuster drug would be a 40mm drop in blood pressure. (Effect size can be
modified by variability – but use a stats package or check with a statistician).
Step 3 – Using your real world effect size, perform a power
calculation. Power calculations enable you to give an actual value for the number
of animals needed for each study that will lead to real, reproducible results.
‘We use n=6 because everyone else does’ or ‘We use n=5 because that’s how many
fit in a cage’ apparently is not good enough. There are four main elements for
a power calculation: you need to state all 4, and justify why. The 4 parts are –
level of type 1 error (false positives, α or p value normally 0.05), level of
type 2 error (false negatives, β normally 20%), the effect you are looking for
(see above) and variability (this has to be based on real world data – yours or
from literature). There are lots of stats packages that once you have this info
can work out the power for you, this one works.
Step 4 – How will you reduce unconscious bias – use the ARRIVE guidelines as a
checklist, try the experimental design
assistant. Think about how you will blind, normalise, randomise etc. There
are some worked examples on the MRC
site which might help.
Step 5 GET ADVICE FROM A STATISTICIAN. Did I mention I am
not a statistician? If you are reading this as your sole guidance you are in
trouble! There are multiple caveats with the approach I have suggested: it is
clearly not going to work for all cases, discovery science/ hypothesis-free
work will need other approaches, multiple time points and repeated sampling
change the weighting of the p value (think multiple coin tosses – the chances
of 6 heads is the same as the chance of H,T,T,T,H,H). Since most programs of
work are complex and multi-endpoint it is not possible to do a detailed power
calculation for each part, one approach would be to do the analysis for the
major arm of the work to demonstrate you know what you are doing.
On the whole my birthday trip to the stats seminar was,
surprisingly, much better than a trip to the zoo (then again I don’t really
like zoos that much). It was very thought provoking about improving
experimental design and a step towards better more reproducible science as a
community.
The seminar was organised by the NC3Rs and MRC, but I am
writing on my own behalf and the opinions stated here are mine.
No comments:
Post a Comment