Computational & Statistical Genomics – Rubinacci Lab
Articles / 11 April 2026
DNA-repeats somatic-expansions UK Biobank methods

Large-scale datasets and advanced computational methods reveal new insights into DNA repeat sequences

Certain repeat sequences in our DNA actually expand and contract as we age, driving serious diseases. A recent study uses massive biobanks and advanced computational methods to map where and why these genomic changes occur.

Iida Röksä
Iida Röksä 11 April 2026

We often think of our DNA as fixed and unchanging, but certain sequences can actually expand and contract as we age, changing within the cells of our body. These sequences, known as DNA repeats, are linked to serious diseases, yet exactly how and why they change is still largely unclear. A recent study takes a closer look by mapping where in the genome such changes occur, using massive biobank datasets and advanced computational methods—both essential to modern genetics and central to our research group’s work.

What DNA repeats are and why they matter

Over the course of our lives, our DNA undergoes numerous changes. Of particular interest are DNA repeat sequences, short segments of DNA that repeat multiple times consecutively at a given location in the genome. Think of them as a single word written over and over again in a sentence. In some cases, these repeats can expand, meaning that additional copies are added and the sequence becomes longer. If the repeating chain gets too long in certain tissues, it can cause diseases such as Huntington’s disease.

Diagram of DNA repeat expansion

There are hundreds of thousands of repeat sequences in the human genome, and they do not all behave in the same way. Whether a repeat causes disease depends on where it is located and how much it expands.

To really understand such diseases, we need a much clearer picture of how and why DNA repeat sequences expand and contract. A recent Nature study by Margaux Hujoel (currently Assistant Professor at UCLA), Po-Ru Loh and colleagues takes a major step in this direction by analysing repeat variation across 900,000 individuals. It shows how repeat sequences change as we age and highlights the scale at which these changes can now be studied across the genome.

Partly inherited, partly shaped over time

The length and stability of DNA repeats are partly inherited from our parents. Some people are born with genetic variants that make DNA repeats more likely to expand. Hujoel and colleagues found that people with a high number of such variants showed up to four times more repeat expansion in their blood cells than those with fewer of them. Simply put: the more expansion-promoting variants you inherit, the faster DNA repeats tend to grow over your lifetime.

But inheritance is only half the story. Repeat sequences also change as we age. These changes occur within the cells of our body and are known as somatic changes, which are not passed on to our children. Because the changes happen separately in different cells and tissues, the same inherited repeat sequence can behave very differently in different parts of the body—for example, expanding in the brain but staying stable in the liver. This somatic variation makes repeat-related diseases difficult to study, but also helps explain why some conditions affect specific organs while leaving others untouched.

Why repeat expansions are hard to predict

DNA repeats are unusually unstable, which means that they are prone to expand or contract more easily than many other parts of the genome. They change frequently within our cells, which makes somatic repeat expansions relatively common.

Repeat instability is shaped by many small factors acting together, including genes involved in DNA repair. As shown by Hujoel and colleagues, a single genetic factor does not affect all repeat sequences in the same way: the same variant may increase instability in one repeat while stabilising another.

Repeat sequences themselves also behave differently depending on where they are located in the genome. By analysing the entire genome, the researchers identified 29 specific spots where inherited variants strongly influence repeat expansion, especially in blood cells. Although rare, large expansions at some of these locations were linked to an increased risk of severe liver and kidney diseases.

Repeat instability is shaped by several factors acting together. Because these factors vary from place to place, repeat expansions can be common overall but still hard to predict in any specific region or tissue.

From sequencing data to biological insight

The study by Hujoel and colleagues is an exciting example of what large biobank datasets can reveal about human genetics. By analysing genetic data from as many as 900,000 individuals, the researchers were able to track changes in DNA repeats across the entire genome.

Diagram of DNA repeat expansion

This approach is also central to our research group. Large biobank datasets have transformed human genetics by enabling analyses at an unprecedented scale and with much stronger statistical power. They help us detect subtle but biologically meaningful signals that smaller studies would miss, making the findings more relevant for medicine.

However, data alone is not enough. To make sense of it, we need computational methods that can handle complex and noisy datasets. As this study shows, these methods help us figure out which genetic factors influence repeat changes and where in the genome they occur. Developing these methods is also a key part of our research: building tools that can detect and interpret complex variation in large genetic datasets.

Towards new ways of understanding and treating disease

Recent advances in human genetics are already helping us shed light on diseases linked to DNA repeat expansions. Take Huntington’s disease, which usually doesn’t appear until adulthood. We now know that both inherited genes and other genetic factors influence how fast the disease progresses. As we get better at mapping how DNA repeats change over time, we may eventually find ways to slow down, or even stop, the expansions that drive disease.

At the same time, many aspects of repeat variation remain unclear. We still do not fully understand why some repeats expand in certain tissues but not others, or why the same genetic factors can have different effects depending on the repeat.

Studies like this one show that we are beginning to answer these questions. For our research group, this is where things become especially interesting: understanding complex genetic variation and developing the tools needed to study it at scale. By combining large-scale data with computational methods, we can study how DNA repeat sequences change and what drives those changes.


Article on the web version of the journal Nature

Paper cited Hujoel, M.L.A., Handsaker, R.E., Tang, D. et al. Insights into DNA repeat expansions among 900,000 biobank participants. Nature 650, 920–929 (2026). https://doi.org/10.1038/s41586-025-09886-z