Readings for CSE/BIO 597F
Overview
The DNA sequence of human chromosome 22.
Dunham I, et al.
The complete article can be found by following links from the
Sanger Center's website for Chr. 22
and excerpts concerning genefinding are here.
Searching sequence databases
Sequence analysis and database searching. Gregory D. Schuler.
Chapter 7 in the book by Baxevanis & Ouellette.
On-line Blast tutorial..
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.
The complete paper is available on-line by following links from the PubMed
website.
Protein sequence similarity searches using patterns as seeds.
Zhang Z, Schaffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF.
The complete paper is available on-line by following links from the PubMed
website.
Stephen Altschul's three lectures on Blast statistics:
1,
2,
3.
Gene identification using ESTs
Analysis of EST-driven gene annotation in human genomic sequence.
Bailey LC Jr, Searls DB, Overton GC
A comparison of expressed sequence tags (ESTs) to human genomic sequences.
Wolfsberg TG, Landsman D
A computer program for aligning a cDNA sequence with a genomic DNA
sequence.
Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W.
The complete paper is available
here.
A greedy algorithm for aligning DNA sequences.
Zhang Z, Schwartz S, Wagner L, Miller W.
The complete paper is available on-line by following links from the PubMed
website.
Interspersed repeats
The origin of interspersed repeats in the human genome.
Smit AF.
Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences.
Smit AF, Toth G, Riggs AD, Jurka J.
Interspersed repeats and other mementos of transposable elements in mammalian
genomes.
Smit AF.
MIRs are classic, tRNA-derived SINEs that amplified before the mammalian
radiation, Smit AF, Riggs AD.
Comparison of human and mouse genomic sequences
Long human-mouse sequence alignments reveal novel regulatory elements: a
reason to sequence the mouse genome.
Hardison RC, Oeltjen J, Miller W.
The complete paper is available by following links from the PubMed website.
PipMaker -- a web server for aligning two genomic DNA sequences.
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R,
Hardison R, Miller W.
The complete paper is available
here.
Comparative analysis of 1196 orthologous mouse and human full-length mRNA
and protein sequences.
Makalowski W, Zhang J, Boguski MS.
Evolutionary parameters of the transcribed mammalian genome: an analysis of
2,820 orthologous rodent and human sequences.
Makalowski W, Boguski MS.
Comparative analysis of noncoding regions of 77 orthologous mouse and human
gene pairs.
Jareborg N, Birney E, Durbin R.
Substitution matrices for protein comparisons
Scores for sequence searches and alignments.
Henikoff S.
Amino acid substitution matrices from protein blocks.
Henikoff S, Henikoff JG.
Multiple sequence alignment
Practical aspects of multiple sequence alignment. Andreas D. Baxevanis.
Chapter 8 in the book by Baxevanis & Ouellette.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position-specific gap penalties and weight matrix
choice.
Thompson JD, Higgins DG, Gibson TJ.
Using CLUSTAL for multiple sequence alignments.
Higgins DG, Thompson JD, Gibson TJ.
The CLUSTAL_X windows interface: flexible strategies for multiple sequence
alignment aided by quality analysis tools.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG.
Motif finding
A comparative analysis of computational motif-detection methods (PDF file).
Hudak J, McClure MA.
A web site for the computational analysis of yeast regulatory sequences.
van Helden J, Andre B, Collado-Vides J.
Computational identification of cis-regulatory elements associated with groups
of functionally related genes in Saccharomyces cerevisiae.
Hughes JD, Estep PW, Tavazoie S, Church GM.
Conservation of DNA regulatory motifs and discovery of new motifs in microbial
genomes.
McGuire AM, Hughes JD, Church GM.
Hidden Markov models and protein families
Hidden Markov models.
Eddy SR.
Profile hidden Markov models.
Eddy SR.
Pfam: a comprehensive database of protein domain families based on seed
alignments.
Sonnhammer EL, Eddy SR, Durbin R.
See also the updates for
1998,
1999 and
2000.
Ab initio gene-finding methods
Computational methods for the identification of genes in vertebrate genomic
sequences.
Claverie JM.
Finding the genes in genomic DNA.
Burge CB, Karlin S.
Predictive methods using nucleotide sequences. Fickett JW.
Chapter 10 in the book by Baxevanis & Ouellette.
Prediction of complete gene structures in human genomic DNA.
Burge C, Karlin S.
Eukaryotic promoter recognition.
Fickett JW, Hatzigeorgiou AG.
Related topics
Noncoding RNA genes.
Eddy SR.
Annotating sequence data using Genotator.
Harris NL.
Frequent alternative splicing of human genes.
Mironov AA, Fickett JW, Gelfand MS.
Interpreting cDNA sequences: some insights from studies on translation.
Kozak M.
Computational methods for the identification of differential and coordinated
gene expression.
Claverie JM.