Science is stories.
Good stories move science forwards. The stories come from
the data and turning data into a story is a long and iterative process. The
more data you have the longer it can take, as our tools get better at producing
more data per sample it is getting harder to find the story. In our recently published study (Inflammatory Responses to Influenza Vaccination at the
Extremes of Age) we were measuring 27 different mediators after giving 2
different vaccines 3 times to 3 different ages of mouse, sampling at 8 timepoints
after vaccination with 5 replicate animals at each timepoint leading to 19,440
data points. This was a tricky knot to unpick.
Inflammatory responses
The aim of the study was to investigate whether age changed
the immune response to vaccination. In particular we were interested in whether
age affected inflammation after immunisation. Inflammation sounds bad, but we
actually need a small amount to kick the immune system and make the vaccine
work. We know that vaccines work less well at the extremes of age and wanted to
determine whether the initial reaction to the vaccine shaped how well it
worked. To investigate the inflammatory response, we used a tool called Luminex.
Luminex measures chemical messengers in the blood called cytokines; these
chemical messengers recruit cells of the immune system to the site of
vaccination, activate them and shape the type of response they generate.
However, as mentioned, Luminex generates LOTS of data: 19,440 data points. The
first time we had the complete dataset, we had to book a study room to have
sufficient space to spread out all the bits of paper with the data on. So how
did we move it from there into a story?
Data Compression |
It took four things –perseverance, perspective, peer review
and bio-informatics.
Perseverance: With any dataset,
but large ones in particular, time is the most critical factor in finding the
story. You need to spend time with the dataset, getting to know it, formatting
and reformatting: sorting by size, time, alphabetically, into classes of
cytokines. Analysis can’t be done piecemeal; several times I would get close to
understanding the data but then have to take time off to do something else and
when I came back to the data would have forgotten the trends I had been close
to identifying and have to start from scratch. There were several dead ends and
times when I wanted to give up as there was no discernible pattern in the data.
Perspective: That said, analysis
can’t all be done in one sitting. You need time for the subconscious to churn
it through, you need to read around the subject to see what other people have
seen, you need conversations with colleagues and chance insights when on the loo. The creative process can’t be rushed.
Peer-review: Exposing your
precious story to the slings and arrows of outrageous review is often
frustrating and can be soul-destroying. However, in this case (and I grudgingly
admit quite frequently for other studies) peer review significantly improved
the paper. It gave us time and perspective to rethink the conclusions and suggested
new ways of analysing and thinking about the dataset.
Bioinformatics: It turns out
that, whilst easy and accessible, excel may not be the most effective tool for
looking at big datasets. There are a range of other bioinformatic tools, which
can help in the analysis. In this case we used principal component analysis.
Now I have no idea how the maths behind this actually works, but I do know it
squishes the 19,000 or so variables into 2 so that you can then see broad
trends in the data and then from there go back and look for individual
variables of interest.
So what did we learn?
Having spent time staring at the data, a number of patterns
did emerge. First of all, age is a major factor in the inflammatory response to
vaccination; with different cytokines being produced in young, adult and
elderly animals. Secondly adjuvants can shape the response. Adjuvants are
compounds that improve vaccine efficacy; the addition of an adjuvant called
MF59 reduced age associated differences, inducing higher levels of the
cytokines IL-5, G-CSF, KC, and MCP-1. The level of these four cytokines
correlated with the level of antibody produced after vaccination. This is
important because it shows that poor responses at the extremes of age can be
overcome through the addition of adjuvants; it also gives us some insight into
what response to a vaccine can lead to the best results. Taking a complex (and
large) dataset and turning it into a story was a lengthy process, but has
helped us understand more about the immune response to vaccines.