TABLE OF CONTENTS
This page describes the input files supported by Laj, and their
required formats. Except where noted, all information applies to
both the stand-alone and applet modes of Laj. Any of these files
can be compressed with GZIP, if the file name ends with
".gz
".
All files must consist solely of plain text characters. (For example, no Word documents.) |
These formats are almost identical to those used by the PipMaker server, and you'll find that the utility programs in our PipTools collection can greatly facilitate the preparation of your files.
This is the main data file that contains the alignments you want to view, in lav format (e.g., output from blastz). The applet form of Laj is only intended to display one such file at a time, but the stand-alone form has the capability to display two of these simultaneously, and designates them as "primary" and "secondary" files. In this case, both files must be based on the same sequences and should also cover the same region.
You can obtain an alignment file in this format by submitting sequences to our PipMaker server and requesting "raw blastz output" on the Advanced PipMaker form. Note that PipMaker sends back your results as email attachments, in this case using a quoted-printable MIME format. Make sure this gets decoded into true plain text when saving the attachment, or Laj is likely to report errors due to the not-quite-right file format.
Laj also needs the original sequences that were aligned, in order to display the text view of the alignments in the bottom panel. By default these are "hidden" input files, in the sense that their names are specified within the alignment file rather than being supplied directly on the command line or as applet parameters. Laj will normally look for these files in the same directory as the alignment file. They should be in the same FASTA format expected by PipMaker, which looks like this:
>Sequence name and arbitrary header text on one line ACGTGCGCGATCGCCTGCTAGGCGTACGTCGCAG GCGATCGATGTGCTAGATCAGATGACA ... etc.At the present time, our software supports only the letters A , C , G , T , N , X in the sequence (and their lowercase versions, if you are using Advanced PipMaker's user-controlled masking). For maximum interoperability, the sequence data should consist of short lines limited to about 70 characters, and it is generally best to keep the header line to a reasonable length as well. The first sequence must be contiguous, but the second file can contain multiple unordered contigs (each with its own header line).
If you obtained your alignment file from PipMaker, be aware that it will have substituted its own names for your sequence files in the blastz output. In stand-alone mode Laj will ask you for the correct names as needed, but if you want to set it up as an applet you will need to supply them in advance. Previous versions of Laj required editing the alignment file to fix this, but now it supports optional parameters that allow you to override the names in the alignment file.
This file lists the locations of genes, exons, and coding regions in the first aligned sequence, in the same format expected by PipMaker. The directionality of a gene (">", "<", or "|"), its start and end positions, and name should be on one line, followed by an optional line beginning with a "+" character that indicates the first and last nucleotides of the translated region (including the initiation codon, Met, and the stop codon). These are followed by lines specifying the start and end positions of each exon, which must be listed in order of increasing address even if the gene is on the reverse strand ("<"). By default PipMaker and Laj will supply exon numbers, but you can override this by specifying your own name or number for individual exons. Blank lines are ignored, and you can put an optional title line at the top. Thus, the file might begin as follows:
My favorite genomic region < 100 800 XYZZY + 150 750 100 200 600 800 > 1000 2000 Frobozz gene 1000 1200 exon 1 1400 1500 alt. spliced exon 1800 2000 exon 2 ... etc.Several of the PipTools programs (including genbank2exons, genscan2exons, and blest2exons) can help you build an exons file.
This file lists interspersed repeats and other features in the first aligned sequence. The first line tells PipMaker that this is a simplified repeats file (as opposed to RepeatMasker output); it is ignored by Laj, which only accepts this simplified format. Each subsequent line specifies the start, end, direction, and type of a particular feature.
%:repeats 1081 1364 Right Alu 1365 1405 Simple ... etc.The allowed types are: Alu , B1 , B2 , SINE , LINE1 , LINE2 , MIR , LTR , DNA , RNA , Simple , CpG60 , CpG75 , and Other . Of these, all except Simple , CpG60 , and CpG75 require a direction ( Right or Left ).
One of the PipTools programs, called rmask2repeats, translates the first part of the output from RepeatMasker to this format automatically. The input you supply to this program is the same mask file you would submit to PipMaker. Laj does not automatically locate CpG islands, but it can display them in the features panel like PipMaker does, if they are included at the end of your repeats file. You can create these entries easily using another PipTools program, find-cpg.
This file contains reference annotations, i.e., links to web sites providing information about particular regions in the first aligned sequence, which are drawn as colored bars. The applet form of Laj actually brings up these sites when the user clicks on the bars, but the stand-alone form does not, since there is no web browser involved. The format first defines various types of hyperlinks and associates a color with each of them, then specifies the type, position, description, and URL for each annotated feature. This is almost identical to the format accepted by PipMaker (see below for an exception), but it is a change from the format originally used by Laj, which is no longer supported.
# annotations for part of the mouse MHC class II region %define type %name PubMed %color Blue %define type %name LocusLink %color Orange %define annotation %type PubMed %range 1 2000 %label Yang et al. 1997. Daxx, a novel Fas-binding protein... %summary Yang, X., Khosravi-Far, R. Chang, H., and Baltimore, D. (1997). Daxx, a novel Fas-binding protein that activates JNK and apoptosis. Cell 89(7):1067-76. %url http://www.ncbi.nlm.nih.gov:80/entrez/ query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9215629&dopt=Abstract ... etc.Here, for example, the first stanza requests that each feature subsequently identified as a PubMed entry be colored blue. The name must be a single word, perhaps containing underline characters (e.g., Entry_in_GenBank ), and the color must come from Laj's color list.
The third stanza associates a PubMed annotation with positions 1-2000 in the first sequence. The label should be kept fairly short, as it will be displayed on Laj's position indicator line when the user points at this annotation. The summary is optional; it is used only by PipMaker and will be ignored by Laj. Note that summaries and URLs (but not labels) can be broken into several lines for convenience; the line breaks are removed when the file is read, but they are not replaced with spaces. Thus a continuation line for a summary typically begins with a space to separate it from the last word of the previous line, while a URL continuation does not.
Also note that stanzas should be separated by blank lines, and lines beginning with a "#" character are comments that will be ignored. The annotations can appear in the file in any order, and several can overlap at the same position with no problem, since Laj will display them in multiple rows if necessary.
The difference between the annotation formats supported by PipMaker and Laj is that PipMaker allows several summary/URL pairs within a single annotation, while Laj expects each field to occur at most once. If Laj encounters extra URLs, it will just use the first one and display a warning message.
This file specifies color underlays, i.e., colored bands to be painted on the percent identity plot. Currently there are two different formats for this information: the regular format accepted by both PipMaker and Laj, and an additional labeled one that is only used by Laj. The regular format looks like this:
# sample underlays for the BTK region LightYellow Gene Green Exon Red Strongly_conserved 35324 72009 Gene 49781 49849 Exon 51403 51484 Exon 50350 50513 Strongly_conserved + 52376 52603 Strongly_conserved ... etc.The first group of lines describes the intended meaning of the colors, while the second group specifies the location of each band. Colors must come from Laj's color list, but the meaning of each color can be any single word chosen by you. A "+" or "-" character at the end of a location line will paint just the upper or lower half of the band, respectively. This allows you to differentiate between the two strands, or to plot potentially overlapping features like gene predictions and database matches. Note that if two bands overlap, the one that was specified last in the file appears "on top" and obscures the earlier one (except for the special Hatch color). Thus in this example, the green exons and red strongly conserved regions cover up parts of the long yellow band representing the gene. As in the annotations file, lines beginning with a "#" character are comments that will be ignored.
The second format is similar to the first one, but it allows you to specify a label for each color band which will be displayed on Laj's position indicator line when the user points the mouse at that band. The color definition lines are the same as for the regular format, but the location lines look like this:
35324 72009 (Here is one label) Gene 50350 50513 (Here is another one) Strongly_conserved +An underlay file for Laj can contain a mixture of these two formats (i.e., the label is optional). The parentheses must be present if the label is, and the label itself cannot contain any additional parentheses. (Note that the dummy item formerly required by this format is no longer necessary; it is still supported for your old files, but its use is discouraged.)
One common use of the underlay feature is to mark predictions made by Genscan. To facilitate this, the PipTools collection includes a program called genscan2underlays that translates a Genscan output file to either of these underlay formats automatically. Another of the tools, exons2underlays, creates underlays that correspond to your exons file.
This file is analogous to the underlay file, but it specifies colored regions for the text view rather than for the pip. The format is nearly identical to the underlay file, except that instead of specifying "+" or "-" , you give a number "1" , "2" , or "3" to indicate the row of text where the highlight should be painted. If you don't provide a number, all three rows will be highlighted. Just as with underlays, labels can be included which will be displayed when the user points at the highlight, and highlights that are listed later in the file will cover up those that appear earlier. However, the Hatch color is not supported for highlights.
If you do not specify a highlight file, Laj will automatically provide default highlights based on the exons file. These will be placed in the top row (since the exons were specified for the first sequence), and different colors will be used to indicate the forward or reverse strand. If the exons file specifies a gene's translated region, then the 5´ and 3´ UTRs will be shaded using lighter colors. These default highlights make it easy to examine the putative start/stop codons and splice junctions, as well as providing a visual connection between the graphical and text views. But if for some reason you do not want any text highlights, you can suppress them by specifying an empty highlight file.
For Laj, the available colors are:
Black White Gray LightGray DarkGray Red LightRed DarkRed Green LightGreen DarkGreen Blue LightBlue DarkBlue Yellow LightYellow DarkYellow Pink LightPink DarkPink Cyan LightCyan DarkCyan Purple LightPurple DarkPurple Orange LightOrange DarkOrangeThese names are case-sensitive (i.e., capitalization matters). As of this writing, these are the same colors supported by PipMaker, but be aware that the appearance of the colors may vary between PipMaker and Laj, and from one printer or monitor to the next.
In addition to the regular colors listed above, Laj supports a special "color" for underlays called Hatch , which is drawn as a pattern of diagonal gray lines. Normally if two underlays overlap, the one that was specified last in the file appears "on top" and obscures the earlier one. However, Hatch underlays have the special property that they are always drawn after the other colors, and since the space between the diagonal lines is transparent, they allow the other colors to show through. Currently Hatch is only supported for underlays, not for highlights or hyperlink annotations.