Readings for BMMB/CSE 597D
Overview
The DNA sequence of human chromosome 22.
Dunham I, et al.
The complete article can be found by following links from the
Sanger Center's website for Chr. 22
and excerpts concerning genefinding are here.
Interspersed repeats
The origin of interspersed repeats in the human genome.
Smit AF.
Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences.
Smit AF, Toth G, Riggs AD, Jurka J.
Interspersed repeats and other mementos of transposable elements in mammalian
genomes.
Smit AF.
MIRs are classic, tRNA-derived SINEs that amplified before the mammalian
radiation, Smit AF, Riggs AD.
Homology based gene-finding methods
Sequence analysis and database searching. Gregory D. Schuler.
Chapter 7 in the book by Baxevanis & Ouellette.
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.
The
complete paper is available on-line.
On-line Blast tutorial..
Stephen Altschul's three lectures on Blast statistics:
1,
2,
3.
Analysis of EST-driven gene annotation in human genomic sequence.
Bailey LC Jr, Searls DB, Overton GC
A comparison of expressed sequence tags (ESTs) to human genomic sequences.
Wolfsberg TG, Landsman D
A computer program for aligning a cDNA sequence with a genomic DNA
sequence.
Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W.
Comparison of human and mouse genomic sequences
Long human-mouse sequence alignments reveal novel regulatory elements: a
reason to sequence the mouse genome.
Hardison RC, Oeltjen J, Miller W.
PipMaker -- A Web server for aligning two genomic DNA sequences.
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R,
Hardison R, Miller W.
Comparative analysis of 1196 orthologous mouse and human full-length mRNA
and protein sequences.
Makalowski W, Zhang J, Boguski MS.
Evolutionary parameters of the transcribed mammalian genome: an analysis of
2,820 orthologous rodent and human sequences.
Makalowski W, Boguski MS.
Comparative analysis of noncoding regions of 77 orthologous mouse and human
gene pairs.
Jareborg N, Birney E, Durbin R.
Substitution matrices for protein comparisons
Scores for sequence searches and alignments.
Henikoff S.
Amino acid substitution matrices from protein blocks.
Henikoff S, Henikoff JG.
Ab initio gene-finding methods
Computational methods for the identification of genes in vertebrate genomic
sequences.
Claverie JM.
Finding the genes in genomic DNA.
Burge CB, Karlin S.
Predictive methods using nucleotide sequences. Fickett JW.
Chapter 10 in the book by Baxevanis & Ouellette.
Prediction of complete gene structures in human genomic DNA.
Burge C, Karlin S.
Eukaryotic promoter recognition.
Fickett JW, Hatzigeorgiou AG.
Hidden Markov models and protein families
Hidden Markov models.
Eddy SR.
Profile hidden Markov models.
Eddy SR.
Pfam: a comprehensive database of protein domain families based on seed
alignments.
Sonnhammer EL, Eddy SR, Durbin R.
See also the updates for
1998,
1999 and
2000.
Multiple sequence alignment
Practical aspects of multiple sequence alignment. Andreas D. Baxevanis.
Chapter 8 in the book by Baxevanis & Ouellette.
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position-specific gap penalties and weight matrix
choice.
Thompson JD, Higgins DG, Gibson TJ.
Using CLUSTAL for multiple sequence alignments.
Higgins DG, Thompson JD, Gibson TJ.
The CLUSTAL_X windows interface: flexible strategies for multiple sequence
alignment aided by quality analysis tools.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG.
Related topics
Noncoding RNA genes.
Eddy SR.
Annotating sequence data using Genotator.
Harris NL.
Frequent alternative splicing of human genes.
Mironov AA, Fickett JW, Gelfand MS.
Interpreting cDNA sequences: some insights from studies on translation.
Kozak M.
Computational methods for the identification of differential and coordinated
gene expression.
Claverie JM.