Arab Pangenome Reference Published

July 29, 2025

By Bio-IT World Staff 

July 29, 2025 | Researchers have published the first draft Arab pangenome reference, the UAE Pangenome Reference (UPR) based on 53 individuals of diverse Arab ethnicities residing in the United Arab Emirates. The work was done at the Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Health, Dubai, United Arab Emirates and was published in Nature Communications last week (DOI: 10.1038/s41467-025-61645-w).  

Arabs are a community of nearly 500 million culturally diverse individuals from the Middle East and North Africa, comprising about 6% of the global population. Thus far, Arab populations have not been adequately represented in large-scale sequencing projects like the gnomAD database, and neither the HPRC pangenome nor the 1000 Genomes Project include samples from this demographic, the authors write. “The lack of reference genomes for Arab populations has limited the investigation of genetic diversity and the genetic underpinning of numerous diseases,” they contest. “Population-specific reference pangenomes will enable the identification of variants associated with diseases and sequences that are unique or prevalent in Arab populations.” 

The study looked at the genomes of 53 healthy individuals all living in the UAE but from eight Arab countries (UAE, Saudi Arabia, Oman, Jordan, Egypt, Morocco, Syria and Yemen). Fifty of the individuals were unrelated; three were family members. The genomes were sequenced using Pacific Biosciences (PacBio) high-fidelity (HiFi), Oxford Nanopore Technologies (ONT) ultralong read sequencing (ULK) and high-coverage (Hi-C) Illumina short-read sequencing methods. 

Both long read technologies—the PacBio HiFi and Oxford Nanopore ULK—were used to analyze coverage across diverse regions of the human genome, including satellite DNA, centromeric transitions, and ribosomal DNA. “When mapped against CHM13 v2.0, both PacBio HiFi and ONT reads provided valuable insights into the sequencing coverage across acrocentric and metacentric chromosomes,” the authors wrote. “Due to the use of ultralong protocols, the ONT reads exhibited better mapping across all chromosomes than did the PacBio reads, particularly for the acrocentric chromosomes, where 99.49% (94.95%–100%) coverage was achieved by ONT in comparison to 95.60% (91.61%–99.75%) coverage achieved by PacBio.” The ONT platform, they added, demonstrated a marked increase in rDNA coverage.  

New Genomic Findings 

The UPB revealed significant genomic findings. “We discovered 111.96 million base pairs of previously uncharacterized euchromatic sequences absent from existing human pangenomes, the T2T-CHM13 and GRCh38 reference human genomes, and other public datasets,” the authors wrote in the paper. “Moreover, we identified 8.94 million population-specific small variants and 235,195 structural variants within the Arab pangenome, not present in linear and pangenome references and public datasets.”  

There is still work to do. The authors highlight next steps including the value of a larger sample size to capture additional rare gene duplications and additional unique sequences and the need for clinical cohort-based analyses to assess the clinical significance of the thousands of previously uncharacterized UPR-specific variants.  

But the value of the work is substantial.  

“MBRU’s work demonstrates why population-specific pangenomes matter,” said Christian Henry, president and CEO of PacBio in a press release about the work. “We are proud that MBRU, one of our earliest Revio customers, has delivered such a historic contribution to global genomics. This study will have lasting impact for research and precision medicine in a historically underrepresented population.” 

The authors called the UPR a “crucial foundation for genetic research. Arab populations are notably underrepresented in large genomic databases, and the UPR will address this gap by offering essential resources for clinical genomic laboratories to enhance the precision of variant interpretation.”