The Paracel
GeneMatcher2 system has three main hardware layers.
These are the GeneMatcher2 hardware accelerator, a
user host machine, and post-processing servers.
The heart of the GeneMatcher2 system
is a hardware accelerator containing a pipeline of
9216 parallel processing cells implemented in ASIC
(Application Specific Integrated Circuit) technology.
It accelerates a number of dynamic programming algorithms,
such as Smith-Waterman and HMMer. It contains 430
GB of local disk storage for sequence databases, and
a Sun Sparc server running proprietary pipeline and
file system management software.
Command-line access to the GeneMatcher2
is available from all of the
login hosts. Post-processing is performed on the
Paracel BlastMachine cluster. The post-processors
perform the alignment step after searches. Formatting
of output is done on the user front-end (login) machine.
btk
is the command-line interface to the GM2. The btk
program can be used to perform GeneMatcher2 searches
and use other utilities. The following usage information
is derived from the btk
version 4.1 help output:
btk<category>|<command>
[arguments...]
<command> is one
of the following:
search
- Run a search -- "btk
search -help" lists searches
qstat
- Display queue status
<category> is one
of the following:
db - database
commands
dbset
- database set commands
host -
host commands
matrix
- matrix commands
param
- parameter commands
ppset
- post processing set commands
query
- query commands
queryset
- query set commands
searchset
- search set commands
user -
user commands
For additional help on a category
of commands, type "btk<category>
-help".
For additional help on a command, type "btk<command>
-help"
The following searchs are accelerated
on the GeneMatcher2 hardware using the
command
btk search <searchname> [<arg1>
[<arg2> ...]]:
SMITH-WATERMAN SEARCHES:
swp - protein
queries vs protein databases
tswn - protein
queries vs DNA (translated .rframe) databases
swx - DNA
queries (translated) vs protein databases
swn - DNA
queries vs DNA databases
tswx - DNA
queries (translated) vs DNA (translated .rframe)
databases
HMM AND PROSITE GENERALIZED
PROFILE (GP) SEARCHES:
hmm - HMM/GP
queries vs protein databases
hframe -
HMM/GP queries vs EST/cDNA (translated .rframe)
databases
genewise
- HMM/GP queries vs genomic DNA (as .codon)
databases, allowing for the possibility of introns
GCG PROFILE SEARCHES:
profile
- profile queries vs protein databases
pframe -
profile queries vs DNA (translated .rframe)
databases
The following searches are supported,
but are accelerated on the Pentium
cluster rather than the GeneMatcher hardware:
PROSITE REGULAR EXPRESSION
SEARCHES:
regexp -
Prosite regular expression queries vs protein
databases
To invoke any of these searches,
type:
<searchname> [<arg1>
[<arg2> ...]]
For help on a specific type of search,
type "<searchname> -help".
fdf
is the command-line interface to the GM2's proprietary
file system. This file system is UNIX-like, and resides
on the actual GM2 hardware. All databases loaded onto
the GM2 reside on this filesystem.
FAST DATA FINDER (FDF) FILE MAINTENANCE
COMMANDS:
The following commands are invoked
by typing only the command name at the command line.
You do not need to type btk first.
fdf help
- get additional Fast Data Finder help
fdf ls -
list a GeneMatcher directory
fdf ls -l
- list a GeneMatcher directory, including
the number
of sequences and bytes in database)
fdf df -
show GeneMatcher disk usage
fdf fdfstat
- show status and configuration of GeneMatcher
hardware
fdf mkdir
- make a directory
fdf rmdir
- remove a directory
fdf read
- read single sequence from a GeneMatcher
database
fdf rm -
remove a GeneMatcher database
fdf mv -
move or rename a GeneMatcher database