NCBI

Clone Finder Documentation

Clone Finder provides a clone-centric interface that allows identification of clones aligned to any assembly; searches can be filtered to find clones based on population/strain and library type. Clone Finder is available for any organism that has a Clone Finder icon on the Map Viewer home page.

Results

The top left of the Clone Finder results page lists the assembly, chromosome, link to contig sequence, and chromosomal coordinates for the region you selected.

Data summary

By default the Data summary is minimized but can be expanded by clicking on the Down arrow icon on the right of the Data Summary title bar. This view displays the list of libraries selected from the search page and includes the following information:

These columns can be selected/de-selected by hovering over any column header and clicking on the Down arrow icon then hovering over the columns tag which will reveal a list of tick box column heading options to select from. The resulting columns can be dragged and dropped, by clicking on their column headers, to rearrange their order.

Graphical display

Map Viewer

Below the Data summary is a graphical display of the region of interest with the chromosomal coordinates along the top and Map Viewer tracks for the following features:

Mouse-over the feature image to display feauture name and left click on the feature image to return a pop-up text box containing feature details eg:

These pop-up boxes can be moved around the screen by clicking on and holding on to their title bar, and minimized by clicking the Up arrow icon or closed by clicking the Close icon, both on the top right of the pop-up.

The Download: Image button, on the right of the graphical display title bar, downloads a png file for the displayed image.

The Download: Excel button, on the right of the graphical display title bar, downloads an Excel file for the displayed clone data.

Clones

Below the Map Viewer tracks the graphical data for each library is displayed. By default the concordant clones are displayed if they occupy less than 400 pixels, otherwise they are displayed as a histogram, and discordant clones are not displayed. For each library the concordant and discordant display can be edited using the drop-down menu options on the left of the title bar. Clicking 'Close icon Off' will hide the track, 'Features icon Features' will display each clone and 'Histogram icon Histogram' will display the histogram. For concordant clones dark blue represents the clone end alignments with the light blue dotted line connecting the two ends of each clone. For discordant clones dark red represents the clone end alignments with the light red dotted line connecting the two ends of each clone.

Mouse-over the clone image to display the clone name and left click on the clone image to return a pop-up text box containing clone details:

These pop-up boxes can be moved around the screen by clicking on and holding on to their title bar, and minimized by clicking the Up arrow icon or closed by clicking the Close icon, both on the top right of the pop-up.

Library tables

By default the Library tables are not displayed - click on the Data table icon on the left of the Clone title bar to display the library-specific data table containing the following information:

Mouse-over any column header and click on the Down arrow icon to reveal sorting options (if available) and, under Columns, a list of tick box column heading options to select for display. The columns can be dragged and dropped, by clicking on their column headers, to rearrange their order.

Clicking on the + sign on the left of any clone listed in the Library table expands the view to provide details as seen in the pop-up boxes for the clone in the Graphical display:

Only the first 15 rows of each table are displayed. Click the forward and back arrows on the bottom left of each library table to browse through the pages or add or subtract 5 rows by clicking on the +5 and -5 respectively. Each Library table can be minimized by clicking on the Up arrow icon on the right of the library title bar.

The Download: Excel button, also on the right of the Library table title bar, allows you to download the an Excel file for each library.

Excel files

The Excel files contain three rows for each clone listed in the Library table:

First row (for each clone):

  • Assembly - name of assembly to which clone was mapped
  • Chromosome- chromosome to which clone was mapped
  • ChrAcc - GenBank accession for chromosome contig to which clone was mapped
  • ChrStart - chromsomal start coordinate for aligned clone
  • ChrStop - chromsomal stop coordinate for aligned clone
  • ChrOrient - orientation of clone, '+' or '-'
  • CloneName - name of clone
  • CloneId - internal identifier used to track clones
  • CloneInsertSeqs - sequence identifier of clone (accession, GI, TI etc)
  • Concordant - 'concordant' if clone size and orientation is as expected or 'discordant' if size and/or orientation is not as expected
  • Unique - either 'Unique' if clone mapped to a single loction with same score or 'Tie' if clone mapped to more than one location
  • CloneMethod - 'End_seq' if mapped by placement of clone end sequences, 'Insert_seq' if mapped by placement of whole insert sequence, 'STS' or 'Combined'

Second row (for each INSERT):

  • INS - identifies mapped sequence as an clone insert
  • Chromosome - chromosome to which clone insert was mapped
  • CloneInsertSeq - sequence identifier of clone (accession, GI, TI etc)

Second and third rows (for each pair of ENDs):

  • END - identifies mapped sequence as a clone end
  • Chromosome- chromosome to which clone end was mapped
  • ChrAcc -GenBank accession for contig to which clone end was mapped
  • ChrStart - chromsomal start coordinate for aligned clone end
  • ChrStop - chromsomal stop coordinate for aligned clone end
  • ChrOrient - orientation of clone end, '+' or '-'
  • CloneEndConfidence - 'Unique' if it maps to a single location or 'Multiple' if it maps to more than one location
  • CloneEndType - 'F' if forward or 'R' if reverse
  • CloneEndAcc - GenBank accession for clone end
  • CloneEndGi - GenBank gi for clone end
  • CloneEndTi - Trace Archive ti for clone end

Placement method

Cleanup:

Sequences, quality scores and end information (such as trace name, trace strand, template name) of clone end sequences are retrieved from dbGSS or TraceArchive DB server. After quality clipping, vector clipping, and Windowmasking (Morgulis, A. et. al.) cleaned sequences are stored in fasta files for further analysis.

Alignments:

Cleaned end sequences are aligned to contigs using an in-house BLAST-guided global alignment tool. This involves using megaBLAST (Zhang, Z. et. al.) to align the cleaned, masked clone ends to the contigs using “-F T -W 28 -r 1 -q -3 -Z 200 -e 1e-10 -U T” as default parameters. High identity, contained or dovetailed alignment are accepted as valid alignments. High identity but non-contained or dovetailed alignments are grouped by their subject_id (ie contig’s gi) into Seq_align_set for global alignment. The Seq_align(s) with the highest cov_pct can be only reported if no alignment from banded alignment has higher cov_pct.  Only Seq_align_set(s) which at least 20% input sequence is covered by a contig will be tested in banded alignment.

The best local alignment extracted from global alignment, which is high identity, half-dovetailed or close to either contig end, and has higher coverage the best covered blast generated alignment will be reported. Otherwise, Blast generated alignment of the highest cov_pct will be reported. Only alignments with greater than 80% coverage will be used for clone placement.

Placement evaluation

Mean and Standard Deviation of clone size are calculated based on the clones meet the following requirements:

  1. Both ends of a clone hit a common contig exactly once
  2. Both ends are correctly orientated, ie face each other
  3. Clone size is between 10Kb and 500Kb

Concordant is defined as clones meeting the following requirements:

  1. Both ends of a clone face each other in a common chromosome
  2. clone size is between mean +/- 3X s.d.

A clone is a Tie;concordant if it can be placed as concordant to multiple loci.

A Discordant can be an insertion, a deletion or an inversion: