Home | CompStatGen Lab

What we study

Research

🧬

Structural Variation & Disease

Large genomic rearrangements (deletions, duplications, inversions) shape disease risk. We integrate multi-omics data to decode their functional consequences at biobank scale.

🌍

Population Haplotypes & IBD

We identify shared haplotype segments across populations to illuminate human history and enable novel disease mapping via identity-by-descent analysis.

⚡

Scalable Algorithms

We design algorithms that process millions of genomes efficiently. Our tools have shaped multiple UK Biobank releases and are widely adopted globally.

🔬

Low-Coverage Imputation

lcWGS + imputation is now competitive with SNP arrays. We push accuracy to ultra-rare variants at coverages as low as 0.1× for under $1 per genome.

📊

Biobank-Scale Phasing

We phase rare and singleton variants accurately without family data, enabling compound heterozygous disease detection.

🏥

Multi-Omics Integration

We integrate genetic variation with transcriptomics, proteomics, and other molecular layers to trace how genomic changes propagate to disease.

All our research →

ESHG 2025 · Milan, Italy

Our work for the community

Shaping the field with widely used, computationally efficient tools

We actively contribute through conference talks, workshops, and open-source software, presented at ESHG, ASHG, and other international conferences.

Methods papers should be accompanied by robust, well-documented software usable by anyone, from large biobanks to individual labs with limited compute.

lcWGS · Imputation

GLIMPSE2

Biobank-scale low-coverage WGS imputation

We introduced a lcWGS imputation method that sublinearly to millions of haplotypes. Applied to 150,119 UK Biobank genomes.

Documentation →

WGS · Phasing

SHAPEIT5

Rare variant phasing at biobank scale

Statistical phasing method that allows <5% switch error for ultra-rare variants.

Documentation →

SNP Array · Imputation

IMPUTE5

Fast SNP array imputation via PBWT

Method allowing SNP array imputation scale to millions reference individuals.

Documentation →

All our software →

From our blog

Articles

11 Apr 2026 · DNA-repeats

Large-scale datasets and advanced computational methods reveal new insights into DNA repeat sequences

Certain repeat sequences in our DNA actually expand and contract as we age, driving serious diseases. A recent study uses massive biobanks and advanced computational methods to map where and why these genomic changes occur.

20 Jul 2023 · SHAPEIT5

Phasing Singletons: Pushing the Limits of Statistical Haplotype Estimation

Population-based phasing methods have long struggled with singleton variants, where limited sharing of haplotype information makes accurate phase inference very difficult. SHAPEIT5 addresses this by leveraging the local haplotype context rather than relying solely on allele sharing.

15 Jul 2023 · GLIMPSE2

Scaling Imputation with a 150,000-Sample UK Biobank Reference Panel Using GLIMPSE2

The UK Biobank's WGS release created an unusually large reference panel — large enough to strain conventional imputation pipelines. Here's how GLIMPSE2 was redesigned to handle it efficiently.

All our blog articles →

Updates

News

16 Jun 2026

🏆 Théo Schneider wins Lodewijk Sandkuijl Award at ESHG 2026

Théo receives the #ESHG2026 Lodewijk Sandkuijl Award for Best presentation in the field of complex disease and statistical genetics. Congratulations!

01 Jun 2026

👋 Welcome Emma & Octavian

Emma Bourgogne and Octavian Neculau join the group as Research Assistants. Emma will work on leveraging pedigree structure for imputation of low-coverage whole-genome sequences, while Octavian will focus on algorithmic methods related to the Positional Burrows-Wheeler Transform (PBWT). Welcome to the team!

19 May 2026

📄 mCAs paper published in Nature Genetics

Tang et al. paper on patterns and drivers of 43,617 mosaic chromosomal alterations in blood is now published in Nature Genetics.

Read →

17 Apr 2026

🎤 CompStatGen Lab at ESHG 2026

We're heading to ESHG! Théo and Marinella will be giving talks, while Maarja and Francesca will present their work as poster presentations. Simone will be chairing session C31. See you there!

15 Mar 2026

🏆 Maarja awarded a Marie Curie Fellowship!

Incredible news — Dr. Maarja Jõeloo has been awarded a prestigious Marie Curie Fellowship! A fantastic achievement and a wonderful recognition of her outstanding research. Huge congratulations, Maarja! 🎉

See FIMM post →

08 Mar 2026

🌟 Francesca receives an EMBO Scientific Exchange Grant

Wonderful news! Dr. Francesca Rosamilia has been awarded an EMBO Scientific Exchange Grant — a well-deserved recognition of her excellent work. Congratulations, Francesca!

05 Mar 2026

📄 MetaGLIMPSE in AJHG

Kumar et al. introduces a method that allows to combine imputation of low-coverage data from multiple reference panels.

Read →

07 Jan 2026

👋 Welcome Francesca & Alisa

Dr. Francesca Rosamilia joins as a visiting postdoctoral researcher, and Alisa Willman starts her MSc thesis project with the sequencing unit.

01 Jan 2026

🎓 Théo transitions to Doctoral Researcher

Théo has been accepted to the doctoral school and secured 4 years of University of Helsinki funding. Congrats!

24 Nov 2025

🔬 Rotation: Sara Štebe

Sara Štebe joins the lab for a 3-month rotation project. Great to have you with us!

01 Sep 2025

🧬 Welcome Dr. Maarja Jõeloo

We are thrilled to welcome Dr. Maarja Jõeloo to the group as a Postdoctoral Researcher.

19 May 2025

📝 Marinella Laaksonen joins

Marinella joins the lab as a research assistant to work on her MSc thesis project.

01 Apr 2025

🤝 Théo Schneider joins as RA

The lab starts to grow! Théo Schneider joins the team as a Research Assistant.

01 Sep 2024

🚀 Rubinacci Lab launched

The group starts its journey at FIMM, University of Helsinki.

Computational & Statistical Genomics

Research

Structural Variation & Disease

Population Haplotypes & IBD

Scalable Algorithms

Low-Coverage Imputation

Biobank-Scale Phasing

Multi-Omics Integration

Shaping the field with widely used, computationally efficient tools

Articles

Large-scale datasets and advanced computational methods reveal new insights into DNA repeat sequences

Phasing Singletons: Pushing the Limits of Statistical Haplotype Estimation

Scaling Imputation with a 150,000-Sample UK Biobank Reference Panel Using GLIMPSE2

News