Open source
Software & Tools
We develop and maintain open-source tools for large-scale genomic analysis. All software is freely available and actively maintained.
GLIMPSE2
Biobank-scale low-coverage whole-genome sequencing imputation
GLIMPSE2 is a state-of-the-art software in low-coverage WGS imputation. It scales sublinearly to reference panels containing millions of haplotypes and with millions of rare variants. Achieves highly accurate genotype calls even at rare variants (MAF < 0.1%) and ultra-low sequencing coverages (0.1× to 0.5×).
Applied to impute the UK Biobank's 150,119-genome WGS release, using a reference panel of 280,238 haplotypes and over 580 million markers. Compute cost: under $0.1 per genome. Also handles ancient DNA genotyping from degraded low-coverage samples.
SHAPEIT5
Accurate rare variant phasing of large whole-genome sequencing cohorts
SHAPEIT5 is a haplotype phasing method for large WGS datasets. Applied to the complete UK Biobank WGS and WES data, it phases rare variants with switch error rates below 5% even for variants found in a single individual among 100,000 — achiving non-random phasing of singletons without family information. The phased UK Biobank reference panel built with SHAPEIT5 also improved downstream imputation accuracy.
IMPUTE5
Fast SNP array genotype imputation at biobank scale using PBWT
IMPUTE5 is a genotype imputation method built for reference panels with millions of samples. Using the Positional Burrows-Wheeler Transform (PBWT), it identifies optimal subsets of reference haplotypes per individual. Computational cost increases only marginally as reference panel size grows.