U.S. flag

An official website of the United States government

SRA SARS-CoV-2 Detection Tool

Overview

The SARS-CoV-2 Detection tool is designed to quickly determine if an unassembled NGS sample is likely to contain SARS-CoV-2. It is available as a docker package from docker hub and was developed on the basis of NCBI's SRA Taxonomy Analysis Tool.

Approach

Briefly, refseq sequences included in NCBI's Blast refseq_genomic virus databases supplemented with the validated viral genome set (RefSeq neighbors) are analyzed for kmer content. Kmers of length 32 are randomly sampled every 64 bases and assigned to the originating sequence's taxonomy id. After this process is completed for all sequences, the resulting kmers are analyzed to determine if they are associated with more than one taxonomy id. If so, they are moved to the most recent common ancestor node. The resulting table of kmer taxonomy id pairs is used to analyze NGS sequence data. Input data is decomposed into kmers of length 32, and these are mapped to taxids; with increasing number of hits to a taxid indicating greater support for that taxa being present in the sample. Although the kmer database is built from the entire Virus super kingdom, the SARS-CoV-2 Detection tool filters the output by only reporting hits to Coronaviridae and associated child taxa. Input may be an SRA accession or a fasta file. Output includes a summary in tab delemited format (the *.report file), and detailed results in xml format (the *.xml file).

Example Usage

To test

 

sudo docker run ncbi/sars-cov-2-detection-tool ./run_tests.sh

To retreive fasta taxonomy

 

sudo docker run -v $PWD:$PWD:rw -w $PWD ncbi/sars-cov-2-detection-tool /opt/taxonomy/scripts/viral_taxonomy.sh SRR11622005

Example Output

SRR11622005.report
 
   Coronaviridae 100.00% (11994767 hits)
      Orthocoronavirinae 95.22% (11421176 hits)
          Betacoronavirus 95.22% (11421171 hits)
              Sarbecovirus 94.63% (11351012 hits)
                  Severe acute respiratory syndrome-related coronavirus 84.99% (10194616 hits)
                      Severe acute respiratory syndrome coronavirus 2 55.46% ( 6651966 hits)

SRR11622005.xml

Coronaviridae 100.00% (11994767 hits) Orthocoronavirinae 95.22% (11421176 hits) Betacoronavirus 95.22% (11421171 hits) Sarbecovirus 94.63% (11351012 hits) Severe acute respiratory syndrome-related coronavirus 84.99% (10194616 hits) Severe acute respiratory syndrome coronavirus 2 55.46% (6651966 hits).

<taxon_tree parser_version="0.81">
 <taxon tax_id="1" self_count="0" name="root" total_count="11994782">
  <taxon tax_id="10239" self_count="14" rank="superkingdom" name="Viruses" total_count="11994782">
   <taxon tax_id="2559587" self_count="1" name="Riboviria" total_count="11994768">
    <taxon tax_id="76804" self_count="0" rank="order" name="Nidovirales" total_count="11994767">
     <taxon tax_id="2499399" self_count="0" rank="suborder" name="Cornidovirineae" total_count="11994767">
      <taxon tax_id="11118" self_count="573579" rank="family" name="Coronaviridae" total_count="11994767">
       <taxon tax_id="2501931" self_count="0" rank="subfamily" name="Orthocoronavirinae" total_count="11421176">
        <taxon tax_id="694002" self_count="70131" rank="genus" name="Betacoronavirus" total_count="11421171">
         <taxon tax_id="2509511" self_count="1156034" rank="subgenus" name="Sarbecovirus" total_count="11351012">
          <taxon tax_id="694009" self_count="3542068" rank="species" name="Severe acute respiratory syndrome-related coronavirus" total_count="10194616">
          <taxon tax_id="2697049" self_count="6651966" name="Severe acute respiratory syndrome coronavirus 2" total_count="6651966"/>
          <taxon tax_id="2709072" self_count="358" name="Bat coronavirus RaTG13" total_count="358"/>
          <taxon tax_id="1508227" self_count="150" name="Bat SARS-like coronavirus" total_count="150"/>
           <taxon tax_id="442736" self_count="28" name="Bat SARS coronavirus HKU3" total_count="32">
          <taxon tax_id="741999" self_count="4" name="Bat SARS coronavirus HKU3-12" total_count="4"/>
         </taxon>
          <taxon tax_id="347536" self_count="17" name="Bat SARS CoV Rm1/2004" total_count="17"/>
          <taxon tax_id="1283333" self_count="9" name="Bat coronavirus Cp/Yunnan2011" total_count="9"/>
          <taxon tax_id="1415852" self_count="4" name="Bat SARS-like coronavirus WIV1" total_count="4"/>
          <taxon tax_id="511429" self_count="4" name="SARS coronavirus BJ182-12" total_count="4"/>
          <taxon tax_id="1487703" self_count="3" name="Rhinolophus affinis coronavirus" total_count="3"/>
          <taxon tax_id="1503302" self_count="2" name="BtRs-BetaCoV/HuB2013" total_count="2"/>
          <taxon tax_id="1503301" self_count="2" name="BtRs-BetaCoV/GX2013" total_count="2"/>
          <taxon tax_id="722424" self_count="1" name="SARS coronavirus Rs_672/2006" total_count="1"/>
         </taxon>
         <taxon tax_id="2720068" self_count="0" name="unclassified Sarbecovirus" total_count="362">
          <taxon tax_id="2708335" self_count="332" rank="species" name="Pangolin coronavirus" total_count="332"/>
          <taxon tax_id="864596" self_count="23" rank="species" name="Bat coronavirus BM48-31/BGR/2008" total_count="23"/>
          <taxon tax_id="2591233" self_count="6" rank="species" name="Coronavirus BtRl-BetaCoV/SC2018" total_count="6"/>
          <taxon tax_id="2591237" self_count="1" rank="species" name="Coronavirus BtRs-BetaCoV/YN2018D" total_count="1"/>
         </taxon>
        </taxon>
        <taxon tax_id="2509502" self_count="0" rank="subgenus" name="Nobecovirus" total_count="22">
        <taxon tax_id="694006" self_count="22" rank="species" name="Rousettus bat coronavirus HKU9" total_count="22"/>
        </taxon>
         <taxon tax_id="696098" self_count="0" name="unclassified Betacoronavirus" total_count="3">
          <taxon tax_id="663565" self_count="2" rank="species" name="Bat SARS Cov Rs806/2006" total_count="2"/>
          <taxon tax_id="1117224" self_count="1" rank="species" name="Bat coronavirus 2265/Philippines/2010" total_count="1"/>
        </taxon>
        <taxon tax_id="2509494" self_count="3" rank="subgenus" name="Merbecovirus" total_count="3"/>
       </taxon>
       <taxon tax_id="693996" self_count="0" rank="genus" name="Alphacoronavirus" total_count="5">
        <taxon tax_id="2509514" self_count="0" rank="subgenus" name="Tegacovirus" total_count="5">
        <taxon tax_id="693997" self_count="5" rank="species" name="Alphacoronavirus 1" total_count="5"/>
       </taxon>
      </taxon>
    </taxon>
    <taxon tax_id="693995" self_count="0" rank="subfamily" name="Coronavirinae" total_count="12">
     <taxon tax_id="2664420" self_count="0" name="unclassified Coronavirinae" total_count="12">
      <taxon tax_id="1881090" self_count="7" rank="species" name="Rhinolophus monoceros coronavirus" total_count="7"/>
       <taxon tax_id="1508220" self_count="5" rank="species" name="Bat coronavirus" total_count="5"/>
     </taxon>
    </taxon>
   </taxon>
  </taxon>
  </taxon>
 </taxon>
</taxon>
</taxon>
</taxon_tree>

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2022-06-24T17:11:10Z