|
1. EST Screening
A group used the Paracel TranscriptAssembler
(PTA) software to cluster and assemble 35,000 sequences
from more than 20,000 cDNA EST clones, producing about
11,000 non-redundant sequences.
These sequences were then screened using BLASTN and
TBLASTX on the BlastMachine against the nr (non-redundant
protein) and nt (nucleotide) databases. The most informative
results have come from TBLASTX searches against the
nt database.
2. Mouse Genomic-Reads Screening
for SNP Identification
Dr. Stuart Fischer of the Facility
staff is working with a group to identify particular
genes of interest in 4 straints of congenic mice using
1,000 transcripts in seven genetically defined intervals.
To identify strain-specific DNA sequence variants,
they BLAST the transcripts against the NCBI mouse
genomic reads database (>41 million reads) and
other sources to select genomic sequences corresponding
to the complete transcript set. Using the BLASTN algorithm,
this job takes 15 hours. The
genomic sequences are re-assembled from the
fragments using PGA and strain-specific differences
revealed by Calypso and in-house software (Calypso
is a component of the Paracel PGA package). Amino
acid substitutions which effect potentially important
structural changes are identified using the PrISM
software package developed by Dr. An-Suei Yang of
the CGC.
3. Legionella Genome Annotation
Dr. James Russo's lab in the CGC
is sequencing the genome of the Legionella bacterium.
Every few weeks all contigs and unassembled sequences
are searched against various NCBI and local databases.
Such a run can be completed overnight on the BlastMachine.
4. Epilepsy Gene Evolutionary
Analysis
Dr. Pavel Morozov of the Facility
staff worked on a collaboration with Dr. Ruth Ottman,
and Drs. Conrad Gilliam and Sergey Kalachikov of the
Columbia Genome Center to characterize a newly discovered
gene family (LGI), one member of which (LGI1) causes
a rare form of epilepsy. The BlastMachine and GeneMatcher2
(Smith-Waterman, HMMER) were used intensively to search
for distant homologs. Also, comparison of transcribed
sequences from genomic regions of about 10 Mbases
around the LGI family members was performed using
the BlastMachine.
5. Bacterial Enzyme Family Screening
A group developed a system based
on extensive HMM searches (using HMMER on the GeneMatcher2)
to search for potential RNA-related enzyme family
members in a bacterial genome. They are performing
iterative HMM searches for many related enzymatic
families and RNA binding domains and other related
sequences to identify family members. Those sequences
will then be tested experimentally for the expected
enzymatic activity. This entails thousands of HMM
searches.
6. Anopheles gambia (mosquito)
Genome Analysis
Dr. Andrey Rzhetsky of the Columbia
Genome Center performed an analysis of four gene families
and their exon/intron structure in the newly sequenced
genome of Anopheles gambiae using the HMMER and Genewise
algorithms on the GeneMatcher2. Gene families studied
were odorant receptors, serpins, gram-negative bacteria
binding proteins (immunity), and ABC transporters.
7. Discovery of Genes Involved
in Dermatological Disorders
Dr. Fischer is working with a group
with the goal of discovery of genes involved in dermatological
disorders. To identify candidate genes associated
with a unique form of a particular syndrome, they
have developed a software package with a graphical
interface for comparing all known and presumptive
genes in an interval on human chromosome 8q.
|