blest is a version of the sim4 program that is specially tailored for finding near-identity matches between a genomic sequence and a database of ESTs or other expressed sequences. It is often used in conjunction with two other programs: mb (megablast), which speeds up execution by weeding out unsuitable database entries in advance using a much faster algorithm, and summarize, which organizes the matches into putative exons and introns for convenient human readability.
These three programs are supplied as compiled executables for Linux and Solaris (sorry, source code is not available at this time). They are provided "as is", with no warranty of any kind.
Linux 2.2 (glibc) / Intel x86:
blest
mb
summarize
Solaris 7 / Sparc Ultra:
blest
mb
summarize
Example:
Suppose you have a file (called, say, genseq) that contains a genomic sequence in FASTA format, and another file (e.g., estlib) containing a FASTA library of many expressed sequences, some of which correspond to the genomic sequence. (The TIGR Gene Indices make good libraries for this purpose, since most of the EST redundancy has been removed.)
formatdb -i estlib -p F
will produce files called estlib.nhr, estlib.nin,
and estlib.nsq, which will be used by mb when
you refer to "estlib". Also, the genomic sequence should
have its interspersed repeats masked with 'N's to
avoid spurious matches (you can use
RepeatMasker to accomplish this, or
genbank2repeats and mask-seq from our
PipTools package). If the masked version is called
genseq.masked, then the command
mb -e -i genseq.masked -d estlib > mb.out
will store the selected sequences from estlib in a
new FASTA library file called mb.out.
blest genseq mb.out > blest.out
or, if you skipped step 1, use estlib in place of
mb.out.
summarize blest.out > summarize.out
The first part of summarize.out will look a lot like
blest.out, but at the end you will find a nice summary
of the putative exons and introns in sorted order, along with
a list of apparent inconsistencies among these conclusions.
You can find out about additional command-line options for these programs by running them without any arguments. For further discussion about sim4 and blest, please see Florea et al. 1998.
These programs are copyright (C) 1998-2000 by Liliana Florea, Zheng Zhang, Scott Schwartz, and Webb Miller.