Tools for Comparative Genomics

How the genomes were aligned

We used the VISTA pipeline infrastructure [Frazer et al., 2004], based on efficient combination of global and local alignment methods, for the construction of genome-wide pairwise and multiple DNA alignments [Dubchak et al., 2009]. First, we obtain a map of large blocks of conserved synteny between the two species by applying Shuffle-LAGAN glocal chaining algorithm [Brudno et al., 2003] to local alignments produced by translated BLAT [Kent, 2002]. After that we use Supermap, the fully symmetric whole-genome extension to the Shuffle-LAGAN. Then, in each syntenic block we apply Shuffle-LAGAN a second time to obtain a more fine-grained map of small-scale rearrangements such as inversions. A similar strategy is implemented in GenomeVISTA, which allows user-submitted sequences to be aligned and compared against whole genome assemblies (base genomes).

There are three schemes of data presentation on the whole genome scale available for the user -- the VISTA Browser, the VISTA track on the mirrored UCSC genome browser, and the VISTA Point. You will find Help pages here.

VISTA Browser is a Java applet, very efficient for interactively visualizing results of comparative sequence analysis in the VISTA format on the scale of whole chromosomes along with annotations. The user may select any genome as the reference or base, and display the level of conservation between this reference and the sequences of another species in a particular interval. A user can use default values for conservation cutoffs (X% over Y bp), or specify them. Conserved regions are highlighted under the curve, with different colors used for coding and noncoding sequences. The browser has a number of options, such as zoom, extraction of a region to be displayed, user-defined parameters for conservation level, and options for selecting sequence elements to study. VISTA Browser is realized as a dynamic web-interface synchronized with the central MySQL.

VISTA track displays results of our comparative analysis in the context of the whole genome annotation on the mirrored UCSC Genome Browser. VISTA track dynamically creates VISTA plots for each defined region and unlike VISTA Browser can display multiple individual plots if there is an overlap in alignments. VISTA track is accessible from the VISTA Browser by clicking the "UCSC+Vista" button.

VISTA Point is the most direct front end to the central MySQL database. It is also linked to the VISTA Browser and allows a user to examine detailed information about each sequence aligned to the selected region on the base genome. For each aligned region, information such as exact locations of alignments on both genomes, the sequences, alignments, and coordinates of conserved regions are easily retrieved. VISTA Point also gives access to rVISTA to obtain a prediction of potential transcription factor binding sites for any region of a base genome.

Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W273-9.

Dubchak I, Poliakov A, Kislyuk A, Brudno M. Multiple whole-genome alignments without a reference organism. Genome Res. 2009 Apr;19(4):682-9. Epub 2009 Jan 28.

Brudno M, Malde S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S; Glocal alignment: finding rearrangements during alignment. Special Issue on the Proceedings of the ISMB 2003, Bioinformatics, 19: 54i-62i.

Kent WJ. BLAT -- the BLAST-like alignment tool. Genome Res. 2002 4:656-64.