Study evaluates capturing hybridization probes to enrich coronavirus genomic material in bat oral and rectal swabs

In a recent study published on bioRxiv*preprint server, researchers monitored coronaviruses (CoVs) found in bats via targeted genomic sequencing.

Study: Targeted genomic sequencing with probe capture for coronavirus discovery and surveillance in bats. Image Credit: Dana.S/Shutterstock

Public health emergencies such as Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), and Coronavirus Disease 2019 (COVID-19) have highlighted the importance of monitoring zoonotic CoVs. Moreover, these emergencies highlighted the lack of knowledge regarding phylogenetic resolution and several viral genetic factors.

About the study

In the present study, researchers estimated the efficiency of hybridization probe capture to enrich for CoV genome-related material in oral and rectal samples collected from bats.

The team obtained oral and rectal samples between August 2015 and June 2018 from bats for sale in markets or captured and released into the wild. These samples were taken from different locations in the Democratic Republic of Congo (DRC). The different bat species were identified by ecologists via a polymerase chain reaction (PCR) targeting the cytochrome B gene.

A custom panel targeting familiar bat CoV diversity was designed using hybridization probes. Coverage of reference sequences by custom panels was examined silicone. Probe coverage was also assessed for a subset of the reference sequences representing full-length genomic sequences. The team used the custom panel to assess recovery of CoV genomic material via capture of probes from 25 metagenomic sequence libraries. These libraries were prepared from a retrospective collection of nearly 21 oral and rectal swabs collected from bats in the DRC from 2015 to 2018.

The custom probe panel was used to capture CoV genomic material from these bat swab metagenomic libraries. Genomic sequencing was then performed. Recovery of CoV by the probes was assessed by assembling captured sequencing reads de novo. The CoV sequences were then determined by aligning the contigs to the CoV reference sequences. Assembly size measures were also used to examine how well the recovered contigs represented complete genomes.

The team also estimated the recovery of ribonucleic acid (RNA)-dependent partial RNA polymerase (RdRp) amplicons. Additionally, their probe coverage silicone was also evaluated to demonstrate the targets covered by the custom panel. The researchers also assessed nucleic acid concentration and integrity, which were two major aspects correlated with the successful preparation of genomic libraries. This was estimated as median RNA integrity number (RIN) values ​​and RNA concentrations. These values ​​were then compared to the extent of reference sequences retrieved from the corresponding libraries.

The influence of blind spots on genomic recovery by panel probes was examined in genomic libraries. The team also assessed probe coverage against reference sequences that have been assigned to a particular phylogenetic group.


The results of the study showed that the team collected a total of 4,852 bat CoV genomic sequences to design a custom panel comprising 20,000 probe sequences. In this panel, 98.73% of the nucleotide positions in 90% of the target sequences were sufficiently covered. This indicated that the custom panel provided broad probe coverage of familiar bat CoVs. The team reported the recovery of 113 CoV contigs from 17 of 25 metagenomic sequencing libraries. The median size of the total contig assembly was 1724 nucleotides, while the median N50 size of an assembly was 533 nucleotides.

Notably, four out of 25 libraries reported no recovery of CoV sequences, despite generating partial RdRp sequences from these libraries. Moreover, probe capture did not result in any entire CoV genome, whereas many specimens had scattered and discontinuous coverage of reference sequences. Additionally, 95.3% of the nucleotide positions in the partial RdRp amplicons were covered by the custom probe panel. Additionally, for 12 of the 25 sequenced libraries, there was no recovery of any part of the sequenced partial RdRp sequences, while seven of the 25 libraries had more than 95% of the partial RdRp sequences completely or almost completely recovered. This indicated that genomic retrieval via probe had limitations other than the inclusivity of the probe panel.

The team observed weak monotonic associations of lower RIN and concentration values ​​with lower genomic recovery. This correlation was particularly significant for RNA concentrations but not for RNA integrity. These weak relationships indicated that additional factors were responsible for the impaired genomic recovery, including low concentrations of viral material or lack of probe coverage in genomic areas outside the partial RdRp target region.

A total of 92.3% of the reference sequences, such as CDAB0203R-PRE, CDAB0217R-PRE and CDAB0492R-PRE, were recovered for the 25 libraries; however, the full spike viral genes were absent. This suggested the presence of CoVs similar to Bat CoV CMR704-P12 and Cherephon bat CoV/Kenya/KY22/2006, except with new spike genes that were different from the reference sequence spike genes.

Overall, the study results showed the potential of probing CoV genomic material to accurately assess a wider range of viruses.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behaviors, or treated as established information.

Comments are closed.