HOME  |   AMDeC |   Columbia Genome Center |   Contact Us|
About The Center
Introduction
What you can do
Using the Facility
Hardware
Software
Databases
Staff
Services for Users
Access
Manuals
Support
Registration
Resources
caWorkBench2.0
Algorithm Reference
Tutorials & Examples
Links
Maps & Directions
Contact Us
 

 

Installed Software

Commercial Packages

  • Paracel GenomeAssembler - Provides CAP4-based sequence assembly and multiple sequence alignment. Can be used for BAC, bacterial or viral genome sized projects. Provides for use of clone-pair constraints (paired end reads) in assembly.
  • Paracel TranscriptAssembler - Designed for large scale EST transcript reconstruction projects. Assembly is optimized specifically for the characteristics of EST datasets. Rigorous detection and alignment of alternative splice forms. Correct handling of chimeras and repeats. Handles deep assemblies where hundreds of ESTs overlap, including correct consensus base calling. Useful for coding region SNP analysis.
  • Paracel Filtering Package - sequence filtering and masking. Sensitive repeat detection using the GeneMatcher2 to accelerate Smith-Waterman and sequence-profile algorithms. (Includes the programs Scylla and Sequtil).

 

Academic / Open Source Packages

The following packages are available via our login servers. The links below are mostly to the home pages of the various packages for further descriptions:

Sequence Analysis and Manipulation

  • BioPerl - Perl modules that provide many common bioinformatics tools for use in Perl scripts.

  • clustalW 1.83 - multiple sequence alignment and phylogenetic tree building (clustalW help) (clustalX help - more complete than clustalW help) (paper).

  • EMBOSS - a large package of sequence search and manipulation commands, similar to the GCG package.

    Packages that belong to the EMBASSY program suite and share the look, feel and installation mechanism of EMBOSS:


    • EMNU - EMBOSS Menu is Not UNIX
    • MSE v. 0.0.4 - A Multiple Sequence Screen Editor for Biological Sequences
    • TOPO v. 0.1 - Transmembrane region "display"

  • fmtseq- part of the seqio package, fmtseq provides file format conversion between most popular formats including fasta, genbank, gcg, embl, phylip, clustalw and many others.

  • HMMER 2.2g programs from Sean Eddy. These can be used to build HMM models for searches on the GeneMatcher2. For help in building HMMs, including libraries of HMMs, please see the User's Guide (in PDF format). The first chapter is a tutorial. The files used in the tutorial can be found on the Linux/UNIX hosts in directory

    /projects/source/HMMER/tutorial.

    Note - HMM models built with the GCG version of HMMER are not compatible with the GeneMatcher2 - use a package such as clustalW (see above) instead.


  • MUMmer - a package of programs for comparing whole chromosomes or genomes in either nucleotide or protein sequence space.

  • SSAHA - Sequence Search and Alignment by Hashing Algorithm. "The SSAHA algorithm is most suitable for applications requiring exact or 'almost exact' matches between two sequences, such as SNP detection or sequence assembly." (help file).

  • T-COFFEE v. 1.37 - A Multiple Sequence Alignment Package.

    T-COFFEE is integrated into BioPerl. "T-COFFEE is more accurate than ClustalW for sequences with less than 30% identity, but it is slower...." (quoted from: http://www.ch.embnet.org/software/TCoffee.html)
    Comparison with clustalw


Phylogenetic Analysis

See a detailed discussion of the available phylogeny packages.

  • PAML - "Phylogenetic Analysis by Maximum Likelihood". Built as 64-bit binaries on our SUN V880 with access to the full 32 GB of RAM.
    Notes:
    • By default, several of the PAML programs look for a control file (e.g. baseml.ctl) in your current directory.
    • The codeml program (sometimes) makes use of data files that are provided with PAML. These files are available under: /projects/source/PAML/paml3.13
      You need to give the full path to whatever PAML provided data file you are including in the codeml.ctl file. For instance:

      aaRatefile = /projects/source/PAML/paml3.13/mtREV24.dat

    • Documentation:
      Online documentation is available. In addtion, the current PAML manual is available in pdf format. The on-line FAQ seems particularly useful.

  • PHYLIP v. 3.5c - The Phylogeny Inference Package

  • EMBASSY-PHYLIP - The EMBOSS version of PHYLIP

    From the EMBASSY-PHYLIP README file: "Programs have now been modified to use the emboss command language and sequence reading so programs expecting a standard phylip sequence file can now use any sequence file format. The
    interactive programs have not been modified or included here as they would give no advantage form having an emboss interface."

    The EMBASSY-PHYLIP versions of the programs are preceeded by an "e":

    Example:
    Standard PHYLIP: clique
    EMBASSY-PHYLIP: eclique

  • PAUP

 

Statistical Analysis

  • The R Package - R is `GNU S' - A language and environment for statistical computing and graphics. Configured with 53 different packages.

  • Bioconductor - "An open source and open development software project to provide tools for the analysis and comprehension of genomic data (bioinformatics)".

    Although initial efforts focused primarily on DNA microarray data analysis, many of the software tools are general and can be used broadly for the analysis of genomic data, such as SAGE, sequence, or SNP data.

    There are two main types of Bioconductor packages. One set is designed to provide basic infrastructure support that will help other developers produce high quality software for the analysis of genomic data. The other variety provide innovative methodology for analyzing genomic data.

    Here is a description of all installed Bioconductor packages.

  • SJava - An Omegahat R Package - A package that provides direct access to calling arbitrary Java methods and creating arbitrary Java objects from within R and also calling R functions from Java. This can be used to get access to network facilities; create graphical interfaces with R functions as callbacks; image manipulation classes; mail package; etc.

  • SOLAR - a suite of algorithms for linkage and quantitative genetic analysis: the Sequential Oligogenic Linkage Analysis Routines. From the Southwest Foundation for Biomedical Research.

Molecular Mechanics

  • X-PLOR v. 3.851 - A System for X-ray Crystallography and NMR.

 

System Software

  • Java 2 Platform Standard Edition (J2SE) SDK v. 1.4.0_02.

  • PARI-GP v. 2.1.4 - a software package for computer-aided number theory.A prerequisite for installation of the Net::SFTP Perl module used by the (latest) incarantion of the SOAP client/server.

  • GNU Compiler Collection (GCC) v. 3.2 - builds 64-bit excutables on the V880s

 

Relational Database Systems

  • MySql

  • Oracle


 
Suggestions & Problems? Send e-mail to the Webmaster