University of Glasgow researchers have helped to develop a new method for understanding the relationships between different DNA sequences and where they come from. This information has widespread applications, from understanding the development of viruses, such as SARS-CoV-2, the strain of coronavirus that causes COVID-19, to precision medicine, an approach to disease treatment and prevention that takes into account individual genetic information. The study, led by the Big Data Institute, is published in GENETICS and is the featured paper in the September 2024 edition.

Genetics is rapidly becoming part of our everyday lives. Nearly every week sees another newspaper headline about genetics and human ancestry, with huge datasets of DNA sequences routinely generated and used for medical study. We can make sense of this genomic big data by working out the historical process that created it—in other words, where the DNA sequences came from.

If we take a small section of someone's DNA we know it must have come from one of their two parents in the last generation, and previously from one of their four grandparents in the generation before that, and so on. This means we can represent the history of different sections of DNA by tracing them backwards through time. If we do this for a large set of DNA sequences from different people, we can build up a set of genetic "family trees," a genealogy of DNA sequences.

This grand network of inheritance is sometimes called an ancestral recomb.