Rubinacci Lab · FIMM · University of Helsinki
Rubinacci - Lab
We develop efficient statistical and computational methods to decode human genetic variation at scale, applying them to biobanks to understand the genetic basis of disease.
The team at our lab retreat · Kirkkonummi, Finland · 2026
What we study
Large genomic rearrangements (deletions, duplications, inversions) shape disease risk. We integrate multi-omics data to decode their functional consequences at biobank scale.
We identify shared haplotype segments across populations to illuminate human history and enable novel disease mapping via identity-by-descent analysis.
We design algorithms that process millions of genomes efficiently. Our tools have shaped multiple UK Biobank releases and are widely adopted globally.
lcWGS + imputation is now competitive with SNP arrays. We push accuracy to ultra-rare variants at coverages as low as 0.1× for under $1 per genome.
We phase rare and singleton variants accurately without family data, enabling compound heterozygous disease detection.
We integrate genetic variation with transcriptomics, proteomics, and other molecular layers to trace how genomic changes propagate to disease.
ESHG 2025 · Milan, Italy
Our work for the community
We actively contribute through conference talks, workshops, and open-source software, presented at ESHG, ASHG, and other international conferences.
Methods papers should be accompanied by robust, well-documented software usable by anyone, from large biobanks to individual labs with limited compute.
We introduced a lcWGS imputation method that sublinearly to millions of haplotypes. Applied to 150,119 UK Biobank genomes.
Documentation →Statistical phasing method that allows <5% switch error for ultra-rare variants.
Documentation →Method allowing SNP array imputation scale to millions reference individuals.
Documentation →From our blog
Certain repeat sequences in our DNA actually expand and contract as we age, driving serious diseases. A recent study uses massive biobanks and advanced computational methods to map where and why these genomic changes occur.
Population-based phasing methods have long struggled with singleton variants, where limited sharing of haplotype information makes accurate phase inference very difficult. SHAPEIT5 addresses this by leveraging the local haplotype context rather than relying solely on allele sharing.
The UK Biobank's WGS release created an unusually large reference panel — large enough to strain conventional imputation pipelines. Here's how GLIMPSE2 was redesigned to handle it efficiently.
Updates