blest is a version of the sim4 program that is specially tailored for finding near-identity matches between a genomic sequence and a database of ESTs or other expressed sequences. It is often used in conjunction with two other programs: mb (megablast), which speeds up execution by weeding out unsuitable database entries in advance using a much faster algorithm, and summarize, which organizes the matches into putative exons and introns for convenient human readability.
These three programs are supplied as compiled executables for Linux and Solaris (sorry, source code is not available at this time). They are provided "as is", with no warranty of any kind.
Linux 2.2 (glibc) / Intel x86:
blest
mb
summarize
Solaris 7 / Sparc Ultra:
blest
mb
summarize
Example:
Suppose you have a file (called, say, genseq) that contains a genomic sequence in FASTA format, and another file (e.g., estlib) containing a FASTA library of many expressed sequences, some of which correspond to the genomic sequence. (The TIGR Gene Indices make good libraries for this purpose, since most of the EST redundancy has been removed.)
formatdb -i estlib -p Fwill produce files called estlib.nhr, estlib.nin, and estlib.nsq, which will be used by mb when you refer to "estlib". Also, the genomic sequence should have its interspersed repeats masked with 'N's to avoid spurious matches (you can use RepeatMasker to accomplish this, or genbank2repeats and mask-seq from our PipTools package). If the masked version is called genseq.masked, then the command
mb -e -i genseq.masked -d estlib > mb.outwill store the selected sequences from estlib in a new FASTA library file called mb.out.
blest genseq mb.out > blest.outor, if you skipped step 1, use estlib in place of mb.out.
summarize blest.out > summarize.outThe first part of summarize.out will look a lot like blest.out, but at the end you will find a nice summary of the putative exons and introns in sorted order, along with a list of apparent inconsistencies among these conclusions.
You can find out about additional command-line options for these programs by running them without any arguments. For further discussion about sim4 and blest, please see Florea et al. 1998.
These programs are copyright (C) 1998-2000 by Liliana Florea, Zheng Zhang, Scott Schwartz, and Webb Miller.