HOME  |   AMDeC |   Columbia Genome Center |   Contact Us|
About The Center
Introduction
What you can do
Using the Facility
Hardware
Software
Databases
Staff
Services for Users
Access
Manuals
Support
Registration
Resources
caWorkBench3.0
Algorithm Reference
Tutorials & Examples
Links
Maps & Directions
Contact Us
 

 

SPLASH
Introduction Application Download Building From Source Running SPLASH Documentation and Support Relevant Publications

 

 

Introduction

SPLASH (structural pattern localization analysis by sequential histograms) is a deterministic pattern discovery algorithm which can find sparse amino or nucleic acid patterns matching identically or similarly in a set of protein or DNA sequences. Sparse patterns of any length, up to the size of the input sequence can be discovered without significant loss in performance. Splash is extremely efficient and embarrassingly parallel by nature. Large databases, such as a complete genome, the full set of PROSITE families, or the non-redundant SWISS-PROT database can be processed in a few hours on a typical workstation. Alternatively, a protein family or superfamily, with low overall homology, can be analyzed to discover common functional or structural signatures.

 

Application Download

You can use our caWorkbench2.0 framework to run SPLASH. Alternatively you can use a standalone executable for your research. The supported platforms are:

Microsoft Windows: (Cygwin)

Linux

Cygwin is a linux emulator for Windows. It is required to run or compile SPLASH under a windows environment. Cygwin is available for download from the cygwin website.

Please select a suitable package to download based on your operating system:

splash.exe : executable file for windows

splash: executable file for linux

 

Building From Source

Source code for SPLASH is available via this link

Untar the downloaded file by typing:

gunzip splash.tar.gz
tar - xf splash.tar

You should have two directories, contrib and SPLASH.

cd into contrib/gsoap-2.7 and follow the instructions for building gsoap laid out in the README file.

Basically.

./configure --prefix=$HOME

make

make install exec_prefix=$HOME

cd into SPLASH/src and build SPLASH by typing:

make
-copy the splash executable to the desired directory

These instructions are also available in the README located in the SPLASH/src directory.

 

Running SPLASH

The file splash.property should be in the same directory as the splash executable. The splash.property file contains just one line of the form

soapport=PORT_NUMBER where PORT_NUMBER is the port for splash to bin to. eg. 8040

By default, splash will run as soapserver which can be connected to via caWorkbench. To run SPLASH as a standalone, you need to type:

./splash -P standalone [other options] input

The help for splash is displayed by typing ./splash -h on the command line.

Similarity matrix "BLOSUM50" should be placed in the share subdirectory of the current directory. Any other matrix file should be placed in this directory.

Sample Data

histoall.fa

HistoneH1_aah29046.fa

H1BLASTed.fa

Similarity Matrix

BLOSUM50

 

Documentation And Support

Usage: splash [OPTIONS]... [FILE]

(FILE default is input.fa)

The options are as follows:

-P program_type. Default: soapserver [soapserver|standalone]
-a algo_type. Default: regular [regular|exhaustive|hierarchical]
-q token_type. Default: dna [dna|protein]
-% support_as_percent_of_sequences. Default: 0.80 (Not compatible with j)
-b min_identity_tokens. Default: 2
-i reported patterns must match identically on each token. Default: not set
-j min_support. Default: 80% of sequences in FILE. (Not compatible with %)
-k min_tokens_in_window. Default: 3
-l min_tokens. Default: min_tokens_in_window
-w window. Default: 8
-c cluster size. Default: 10 (Hierarchical)
-d min pattern in cluster. Default: 10 (Hierarchical)
-C decrease_support. Default: 0.05 [0.0 - 1.0] (Exhaustive)
-D min_support. Default: 0.5 [0.0 - 1.0] (Exhaustive)
-m file_name. A similarity matrix file
-o output_type to be supported
-t thread_id number_of_threads. Default: 0 1
-T number_of_processors. Default: 1
-u count sequences. Default: set
-v verbose - print pattern detail. Default: not set
-x max_patterns. Default: 100,000
-z z_core. Set compute the Zscore. default: not set
-h display this help and exit

For further details see: http://research.ibm.com/splash

Contact us:

The development team encourages comments and questions about SPLASH. You can email us at:

caworkbench@cu-genome.org

 

Relevant Publications

1) SPLASH: structural pattern localization analysis by sequential histograms http://bioinformatics.oupjournals.org/cgi/content/abstract/16/4/341 . Submitted to BioInformatics.

2) Statistical Significance of Patterns in Biosequences http://www.research.ibm.com/splash/Papers/Pattern%20Statistics.pdf .

 

 
Suggestions & Problems? Send e-mail to the Webmaster