We need CompBioMed to make sense of Big Data in Biology

Roger Highfield explains why the European Commission has backed a €5 million CompBioMed initiative, which will help to usher in true personalised medicine.

The extraordinary surge in our capacity to collect information about living things has led many to think that we are on the verge of a revolution in medicine.

I don’t disagree but the next great leap depends on more than just data. That step change in biology will not come from placing blind faith in big data but by making sense of this deluge of information with the help of modelling.

This is why the European Commission is investing five million Euros into CompBioMed Biology, an acknowledgement that deep understanding will arise from theory honed by data, not from groping around for correlations in vast datasets harvested from complex biological systems. That is also why I co-authored a paper published today in the Philosophical Transactions of the Royal Society of London to highlight why it is a mistake to place too much faith in data alone.

The prevailing and misguided faith in the power of big genetics data dates back several decades, to when the discovery of a ‘gene for a disease’ helped to foster the comforting illusion that the human genome would provide the ‘recipe to build a human being’ or the ‘blueprint of life’.

But, during the decade after the first human genomes were unveiled by rival teams at the White House in the year 2000, there was disappointment with the practical returns.

Today, as the cost of collecting this information plummets, the idea that a person’s genome holds the key to their future has subsided, not least because deeper understanding of epigenetics has challenged the reductionist idea that we all march to the tune of the DNA sequences that we inherit from our parents.

That is also why the old dream of using a person’s genome to customise their treatment – ‘personalised medicine’ – has given way to the rise of ‘precision medicine’, which aims to sequence millions of genomes to link genes with pathologies in stratified populations.

Even so, many still like to believe that, because biology is so complicated, the rise of omics – genomics, transcriptomics, proteomics, epigenomic, metagenomics, metabolomics, nutriomics and so on – is bound to lead to revolutionary advances.

I suspect we need even more data. We have to think of the body as a dynamical system and how all these omics depend on time, not least as a result of circadian rhythms.
Even bigger data will not be enough.

We need theory to help guide the collection, curation and interpretation of data because, yes, biology is so very bewildering and complicated.

If we are to achieve true personalised medicine, we need to figure out how all these data fit together. We need understanding. We need models.

We need them to deal with the problem of spurious correlations, which is a familiar headache for anyone who has tried to use machine learning to predict the biological activity of molecules.
Performing future drug discovery through machine learning based on accessing all known drugs and pre-existing molecular targets is easily undermined when tiny structural changes in candidate drugs can lead to dramatic differences in potency.

We need models to understand the sensitivity of complex biological systems to tiny errors in data and the effects of chaos.

But to change the face of medicine, we also need models that are truly predictive.

First, they have to provide reliable insights in novel circumstances. As the overestimate of peak influenza levels by Google Flu Trends showed, we have to take care when extrapolating beyond the range of existing data: past success in describing epidemics is no guarantee of future performance.

Second, these models have to be actionable, not in the weak sense that they can be used post hoc, for instance to help hone a drug or process, but in a strong sense that they can be used to predict the future so action can be taken before it becomes a reality, as is already the case when forecasting severe weather.

In medicine, the most vivid example of an actionable prediction is one that can extend the life of a patient. That could mean a prediction that enables a doctor to pick one antimicrobial drug in preference to another when confronted with a severe infection, or to select the best approach for risky life-saving surgery.

Computational methods are now sufficiently advanced, and hardware sufficiently powerful, that CompBioMed will provide a critical hub for applying computer modelling to this tsunami of biomedical and clinical data.

Led by Peter Coveney of University College London, lead author on our paper in Philosophical Transactions, the aim is to develop computational models that can turn big data into timely and accurate predictions. His team recently showed, using supercomputers, a glimpse of the future of personalised medicine. Fifty drugs and candidate drugs were studied to determine how they bind with protein targets in a range of disease cases, in order to rank their potency for drug development and for drug selection in clinical decision making. The aim of the project was to demonstrate that scientists can work out the way that a candidate drug will act on a target in the body – a protein – and in a matter of a few hours.

Prof Coveney’s simulations at the Leibniz Supercomputing Centre are only possible because his team is not only gathering data but modelling the atomic details of how drug molecules actually work in the body.

This earlier work was part of an EU funded project called ComPat, a €3 million project to support emerging high performance computing technologies; it is also a forerunner of CompBioMed, which will now become an EU Centre of Excellence.

The hope is that, thanks to initiatives like CompBioMed, a doctor will one day be able to create a virtual model of a patient, customised with their personalised data, that can be used to make sense of test results and prescribe the best treatment. Ultimately, they will provide a reliable answer to a simple question that doctors today often struggle to answer: ‘What’s wrong with me?’