Lab logo
CompStatGen Lab
FIMM · Helsinki

Open source

Software & Tools

We develop and maintain open-source tools for large-scale genomic analysis. All software is freely available and actively maintained.

lcWGS · Imputation

GLIMPSE2

Biobank-scale low-coverage whole-genome sequencing imputation

GLIMPSE2 is a state-of-the-art software in low-coverage WGS imputation. It scales sublinearly to reference panels containing millions of haplotypes and with millions of rare variants. Achieves highly accurate genotype calls even at rare variants (MAF < 0.1%) and ultra-low sequencing coverages (0.1× to 0.5×).

Applied to impute the UK Biobank's 150,119-genome WGS release, using a reference panel of 280,238 haplotypes and over 580 million markers. Compute cost: under $0.1 per genome. Also handles ancient DNA genotyping from degraded low-coverage samples.

lcWGSImputationUK BiobankRare variantsAncient DNA
Rubinacci et al., Nature Genetics 2023 · doi:10.1038/s41588-023-01438-3
WGS · Phasing

SHAPEIT5

Accurate rare variant phasing of large whole-genome sequencing cohorts

SHAPEIT5 is a haplotype phasing method for large WGS datasets. Applied to the complete UK Biobank WGS and WES data, it phases rare variants with switch error rates below 5% even for variants found in a single individual among 100,000 — achiving non-random phasing of singletons without family information. The phased UK Biobank reference panel built with SHAPEIT5 also improved downstream imputation accuracy.

PhasingWGS / WESRare variantsSingletonsCompound het
Hofmeister, Ribeiro, Rubinacci, Delaneau, Nature Genetics 2023 · doi:10.1038/s41588-023-01415-w
SNP Array · Imputation

IMPUTE5

Fast SNP array genotype imputation at biobank scale using PBWT

IMPUTE5 is a genotype imputation method built for reference panels with millions of samples. Using the Positional Burrows-Wheeler Transform (PBWT), it identifies optimal subsets of reference haplotypes per individual. Computational cost increases only marginally as reference panel size grows.

SNP ArrayImputationPBWTHigh performanceLarge panels
Rubinacci, Delaneau, Marchini, PLOS Genetics 2020 · doi:10.1371/journal.pgen.1009049