|
BLAST
Smith-Waterman
Profile Methods
Hidden Markov Models
GeneWise
|
BLAST searches for initial short exact matches, which
it then extends. It provides moderate sensitivity and
high specificity. Its effectiveness falls off as sequence
similarity decreases.
The various
translated versions allow searches involving nucleotide
sequences to be conducted in the protein sequence space.
This provides greater sensitivity because sequence is
better conserved at the protein level than at the nucleotide
level. Six-frame
translation of the query and database sequences allows
handling frameshifts caused by nucleotide insertions
and deletions.
|
BLASTN
|
Nucleotide query vs. nucleotide database.
|
BLASTP
|
Peptide query vs. peptide database.
|
BLASTX
|
Six-frame translation of nucleotide query sequence
vs peptide database.
|
TBLASTN
|
Peptide query vs six-frame translation of nucleotide
database.
|
TBLASTX
|
Six-frame translation of nucleotide query vs six-frame
translation of nucleotide database.
|
PSI-BLAST (Position Specific Iterative BLAST)
|
Starting with an initial peptide query against a peptide
database, iteratively builds up a statistical sequence
profile by aligning selected multiple hits from successive
runs. At each step after the first, the profile is used
as the query. Good for finding distant homologs not
found by standard BLAST searches..
|
PHI-BLAST (Pattern-Hit Initiated BLAST)
|
Allows searching using a motif. "Combines matching
of regular expressions with local alignments surrounding
the match". (NCBI)
|
Mega-BLAST
|
"Optimized for aligning sequences that differ
slightly as a result of sequencing or other similar
'errors'. It is approximately 10 times faster than more
common sequence similarity programs and therefore can
be used to swiftly compare two large sets of sequences
against each other". "MegaBlast implements
a greedy algorithm for gapped alignment search".
(NCBI)
|
Smith-Waterman
(GeneMatcher2)
The Smith-Waterman algorithm is a dynamic
programming algorithm that guarantees finding the optimal
local alignment of two sequences. It is slower than BLAST
but more sensitive, as for example it does not require
initial exactly matching regions to seed the alignment.
It will thus give better dectection of remote homologs
than BLAST. As it guarantees an optimal sequence alignment,
it will give a lower distance between sequences than will
BLAST (1).
The GeneMatcher2 also provides the Needleman-Wunsch
algorithm, which finds the optimal global alignment. .
That is it aligns the entire query and database sequences
from beginning to end using whatever insertions/deletions
are necessary.
The Paracel algorithms have the ability
to find more than one alignment within a single database
sequence (Mulitple High Scoring Pairs), so that potentially
important hits will not be missed.
The GeneMatcher2 implementation of Smith-Waterman
provides three different kinds of gap opening and extension
penalties:
Affine - separate gap opening
and extension penalties. The default method.
Linear - gap open penalty = gap
extension penalty. Fastest but less realistic and
accurate than the other penalty models.
Double-Affine - Two sets of open and delete
scores are used for gaps. This allows the user to
reduce the penalty of a gap extension as a gap grows
longer. This algorithm is particularly advantageous
when comparing chromosomal DNA to ESTs, as introns
are not over-penalized when trying to find a match.
The GeneMatcher2 implementation of the Smith-Waterman
algorithms provides versions capable of performing searches
using six-frame translations of nucleotide query and database
sequences, as described for the BLAST algorithm above.
|
Affine (swn)
|
nucleotide query vs. nucleotide database
|
Affine (swp)
|
amino acid query vs. amino acid database
|
Linear (swn or swp)
|
gap=linear
|
Double-Affine (swn or swp)
|
gap=double
|
Needleman-Wunsch (swn or swp)
|
alignquery=global, aligndata=global
|
Frame (swx)
|
Translated nucleotide query vs. amino acid database
|
Reverse-Frame (rframe)
|
Amino acid query vs. translated nucleotide (rframe)
database
|
Double Frame (tswx)
|
Translated nucleotide query vs. translated nucleotide
(rframe) database
|
GCG
Profile Searches (GeneMatcher2)
Profile models are used to represent the similarities
between members of a sequence family (typically a protein
family or domain).
Profile searches use a position-specific
scoring matrix built up from aligned sequences. The scoring
matrix reflects the probability of a given residue occurring
at a specific position in the sequence. The matrix describes
a particular domain or family.
In normal use, a profile model is used
as a query against a sequence database to find sequences
similar to a known family. The programs can also be run
with the -invert option to query a sequence against a
database of profiles to investigate if the sequence belongs
to any known families. In the first case, the results
are sorted by the query models (domains), in the second
case, the results are sorted by the query sequences (assuming
multiple queries are submitted).
Searches using GCG format models. The underlying
algorithm used is Smith-Waterman (local alignments).
|
Profile (profile)
|
Amino acid model vs. amino acid database, or, nucleotide
model vs. nucleotide database
-invert option: Amino acid sequence vs. database of
amino acid models, or nucleotide sequence vs. database
of nucleotide models
|
Profile-Frame (pframe)
|
Amino acid profile vs. translated nucleotide (rframe)
database.
-invert option: nucleotide sequence vs. database of
amino acid profiles
|
Hidden
Markov Model and Prosite Generalized Profile Searches
(GeneMatcher2)
The GeneMatcher2 supports Sean Eddy's
HMMER programs for searching sequences with Hidden Markov
Models (HMMs). HMMs are an enhancement of the Profile
methods described above.
They add position-specific scoring for insertions and
deletions. One can search for local or global alignments;
this is a property of the HMM model used. One advantage
of using HMMs is that gap penalties have a rigorous
theoretical underpinning, unlike in BLAST and Smith-Waterman
where they are set empirically.
As with the Profile methods above, search
results can be sorted by HMM model (what sequences contain
this domain?), or by using the -invert option, sorted
by database sequence (what domains are present in this
sequence?).
Can also use Prosite Generalized Profile
(GP) Models.
|
Hidden Markov Model (hmm)
|
amino acid model vs. amino acid database, or, nucleotide
model vs. nucleotide database
-invert option : amino acid sequence vs. database
of amino acid models, or nucleotide sequence vs. database
of nucleotide models
|
HMM-Frame (hframe)
|
amino acid model (HMM) vs. translated nucleotide (rframe)
database
-invert option: translated nucleotide sequence vs.
database of amino acid models (HMMs)
-genomic option: allows for introns.
|
GeneWise
(GeneMatcher2)
Genewise compares genomic sequences to
protein reference sequences or HMM models at the protein
level, allowing for introns (splicing) and frameshifts.
It is "useful for predicting genes and their function
in new, possibly unfinished, genomic sequences"
(Paracel).
Can also use Prosite Models.
|
GeneWise (genewise)
|
amino acid model (HMM) or sequence vs. translated nucleotide
(codon) database.
-invert option comparison: translated nucleotide sequence
vs. database of amino acid models (HMMs).
|