Big Data in Biology Has Produced a Surfeit of Hypotheses – Perhaps Too Many

While the advancement of medical research and its applications are very … ensure the pursuit and quality of basic science doesn’t suffer as a result.

Photo: Sebastian Kanczok/Unsplash.

Even if you aren’t a biologist yourself, you might have heard the terms ‘big data’ and ‘multi-omics technologies’ thrown around. Next-generation sequencing – or multi-omics technology – that spawns ‘biological big data’ has evolved rapidly in a matter of two decades, becoming ubiquitous in research. And DNA sequencing is at the foundation of the multi-omics technologies. First developed by Frederick Sanger in the late 1970s, sequencing has evolved into an automated process that allows us to rapidly and repeatedly sequence a single stretch of DNA, resulting in highly accurate sequence readouts.

Genomics refers to the information we obtain by sequencing genomes, or the entire DNA, of different organisms. Genomics has expanded to give birth to transcriptomics (involving RNA sequencing) as well as proteomics, epigenomics and metabolomics. Sequencing and analysis have transformed our knowledge of biological systems and their inner workings. We can now access information that we never before dreamed of possessing, such as heterogeneity across single cells within a cancerous tumour, implication of little-known genes in the cause and progression of many debilitating human illnesses, the evolutionary trajectories of our ancestors and an improved understanding of the model systems we work with in our laboratories.

These insights have been aided by the steep fall in the price of sequencing over the years, enabling more and more laboratories to contribute to the ever-expanding repository of ‘-omics’ data. This ultimately should bring us to a question: is more data always better? While it’s hard to overstate the benefits of ‘-omics’ technologies, how this data is used and built on is also important. And this is where we may be checking the progress of modern biology.

First, there’s no denying that multi-omics technologies have significant diagnostic and therapeutic implications that have been translated into tangible human benefits. But while there is no doubt that the medical applications of biology are essential to human welfare, there also exists a fundamental aspect of research that seeks to understand of biology for its own sake. And I fear that the present ubiquity of multi-omics technologies, coupled to their success in clinical applications, may coax biologists to steer their research interests to align more with the clinical side of biology than they used to before. While the advancement of medical research and its applications are very important, we need to be careful and ensure the pursuit and quality of basic science doesn’t suffer as a result.

Big data does have its place in fundamental research. In the realm of sub-organismal biology, multi-omics technologies have allowed us to discover hitherto unknown cell types, correlate new genes with pathways and define biomolecular associations over entire genomes. Put differently: multi-omics has allowed us to produce testable hypotheses. The reports that detail these findings often gush about how the stage has been set for more research from the starting point that is big data. But has this really happened?

Of course, there’s an argument to be made that establishing causal relationships from data correlations is in the jurisdiction of the experimental biologist. But has someone experimentally investigated the biological mechanisms involved? I don’t think the answer is a resounding “yes”. Instead, I suspect that we are drowning in data, sidelining the traditional experimental validation of hypotheses while generating even more hypotheses.

So while next-generation sequencing has done a great deal of good for biology research, it’s possible we have reached a moment where we need to pause and look at the big picture that emerges from our individual preoccupations with sculpting a single pixel each. What are we doing with the data we generate? Are we using the data to test hypotheses and gain critical insights into biological mechanisms? Are we furthering fundamental research? Are multi-omics tools and traditional experimental research working in harmony?

I raise these questions as a novice – a student proposing to enter biology research at the point where -omics technology is increasingly towering over the life sciences, and I’m sure that these questions plague my peers, too. The ‘-omics’ revolution is well underway but I believe it’s the answers to questions like these that will determine what sort of mark this ongoing revolution will leave on the future of science.

Amruta Swaminathan is a master’s student at the Indian Institute of Science Education and Research (IISER), Pune.