  • Article
  • Published:

Personalized pangenome references



Pangenomes reduce reference bias by representing genetic diversity better than a single reference sequence. Yet when comparing a sample to a pangenome, variants in the pangenome that are not part of the sample can be misleading, for example, causing false read mappings. These irrelevant variants are generally rarer in terms of allele frequency, and have previously been dealt with by filtering rare variants. However, this blunt heuristic both fails to remove some irrelevant variants and removes many relevant variants. We propose a new approach that imputes a personalized pangenome subgraph by sampling local haplotypes according to k-mer counts in the reads. We implement the approach in the vg toolkit ( for the Giraffe short-read aligner and compare its accuracy to state-of-the-art methods using human pangenome graphs from the Human Pangenome Reference Consortium. This reduces small variant genotyping errors by four times relative to the Genome Analysis Toolkit and makes short-read structural variant genotyping of known variants competitive with long-read variant discovery methods.

Fig. 1: Illustrating haplotype sampling at adjacent blocks in the pangenome.
Fig. 2: Mapping 30× NovaSeq reads for HG002 to GRCh38 (with BWA-MEM) and to HPRC graphs (with Giraffe).
Fig. 3: Small variants evaluation across samples HG001 to HG005.
Fig. 4: SVs benchmark evaluation.

Data availability

This work was done using publicly available data. HPRC v.1.1 graphs and VCF files for the variants included in them are available at The underlying assemblies, including GRCh38, can be found at We used Illumina and Element short reads for HG001, HG002, HG003, HG003 and HG005 available at and, respectively. The GIAB small variant benchmark sets for the same samples can be found at GIAB and challenging medically relevant gene SV sets for HG002 is available at the same location. The T2T assembly of HG002 is available at See Supplementary Section 1 for further details.

Code availability

The haplotype sampling approach described in this article is part of the vg toolkit available under MIT license at There is an example dataset in directory test/haplotype-sampling. Documentation can be found at See Supplementary Sections 4 and 5 for details on other software used.


This work was supported in part by the National Human Genome Research Institute and the National Institutes of Health (NIH). B.P. was partly supported by NIH grant nos. R01HG010485, U24HG010262, U24HG011853, OT3HL142481, U01HG010961 and OT2OD033761. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



J.S. and B.P. conceived the method for haplotype sampling, and J.S. developed and implemented it. J.S., P.E., M.T.U. and M.K. performed the analyses shown in the paper. J.S., G.H., J.M.E., A.M.N., X.C. and J.M. contributed to the vg software on which the method is based and helped modify it for this work. P.-C.C. and A.C. trained and provided support on using DeepVariant for the paper. J.S., P.E., M.T.U. and B.P. wrote the paper. All authors reviewed and edited the draft.

Corresponding authors

Correspondence to Jouni Sirén or Benedict Paten.

Competing interests

P.-C.C. and A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. The other authors declare no competing interests.

Nature Methods thanks Rayan Chikhi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–5, Tables 1–4 and Figs. 1–4.

Reporting Summary

Peer Review File

Supplementary Tables 5–13

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Sirén, J., Eskandar, P., Ungaro, M.T. et al. Personalized pangenome references. Nat Methods 21, 2017–2023 (2024).

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


