Using
TaxPlot to Compare Genomes
TaxPlot is a tool for 3-way comparisons of genomes on the basis of the
protein sequences they encode. To use TaxPlot, one selects a reference
genome to which two other genomes are compared. Pre-computed BLAST results
are then used to plot a point for each predicted protein in the reference
genome, based on the best alignment with proteins in each of the two
genomes being compared.
Figure 1 shows a TaxPlot in which E. coli K12 has been selected
as the reference genome for comparison of two strains of H. pylori,
J99 and 26695. Each point in the figure represents a single E. coli
protein. The X and Y coordinates represent the BLAST score for the proteins
closest match in the two strains of H. pylori. There are 217
E. coli proteins that are equally similar to proteins in the
two H. pylori strains, as shown by the points lying on the central
diagonal. E. coli has 678 proteins with greater similarity to
H. pylori strain J99 (if only marginally), and 687 proteins
with greater similarity to H. pylori strain 26695.
Figure
1: TaxPlot for two strains
of H. pylori against E. coli as the reference genome. Points
representing proteins involved in amino acid transport and metabolism
are highlighted in blue.
Overall, the proteomes of the two H. pylori strains appear to
be equally similar to that of E. coli. However, a few significant
differences between the H. pylori strains show up as off-diagonal
points toward the left-hand portion of the plot. These points represent
proteins in E. coli that better match in one strain of H. pylori
than in the other.
For instance, a number of E. coli proteins have low BLAST scores
against the H. pylori J99 strain, yet relatively high BLAST scores
against the 26695 strain. These points may represent cases in which
selection pressures operating on the orthologs of these E. coli
proteins in the two H. pylori strains are different. To determine
if there is a pattern to these differences, one may identify individual
points on the plot to learn the function of the E. coli proteins
indicated.
Subsets of the points plotted can be selected by simply clicking on
an area of the graph or by using a menu box to select proteins by functional
class. Hyperlinks to the BLAST2 Sequences service provide displays of
pairwise alignments. In the figure, those proteins known in E. coli
to be involved in amino-acid transport and metabolism have been selected
and appear blue in the plot. Note that most of the off-diagonal proteins
of this type are more similar to proteins from H. pylori strain 26695
than J99, suggesting that H. pylori strain J99 may be undergoing
a restructuring of some aspects of its amino acid processing systems.
Such restructuring could represent an important adaptation in the J99
strain of relevance to its pathogenesis.
The TaxPlot tool is accessible from the Entrez
Genomes Web page, under Tools and Analysis. In addition to the microbial
genome version described here, there is also a TaxPlot service for eukaryotic
genomes.