Readings for Bioinformatics I


  • The human genome sequence, public paper
  • The human genome sequence, Celera paper
    Both of these are large files and will take some time to load.

    Searching sequence databases

  • Sequence analysis and database searching. Gregory D. Schuler. Chapter 7 in the book by Baxevanis & Ouellette.
  • On-line Blast tutorial..
  • Basic local alignment search tool. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
  • Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. The complete paper is available on-line by following links from the PubMed website.
  • Protein sequence similarity searches using patterns as seeds. Zhang Z, Schaffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF. The complete paper is available on-line by following links from the PubMed website.
  • Stephen Altschul's three lectures on Blast statistics: 1, 2, 3.

    Gene identification using ESTs

  • Analysis of EST-driven gene annotation in human genomic sequence. Bailey LC Jr, Searls DB, Overton GC
  • A comparison of expressed sequence tags (ESTs) to human genomic sequences. Wolfsberg TG, Landsman D
  • A computer program for aligning a cDNA sequence with a genomic DNA sequence. Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W. The complete paper is available here.
  • A greedy algorithm for aligning DNA sequences. Zhang Z, Schwartz S, Wagner L, Miller W. The complete paper is available on-line by following links from the PubMed website.

    Interspersed repeats

  • The origin of interspersed repeats in the human genome. Smit AF.
  • Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. Smit AF, Toth G, Riggs AD, Jurka J.
  • Interspersed repeats and other mementos of transposable elements in mammalian genomes. Smit AF.
  • MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation, Smit AF, Riggs AD.

    Comparison of human and mouse genomic sequences

  • Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Hardison RC, Oeltjen J, Miller W. The complete paper is available by following links from the PubMed website.
  • PipMaker -- a web server for aligning two genomic DNA sequences. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W. The complete paper is available here.
  • Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Makalowski W, Zhang J, Boguski MS.
  • Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Makalowski W, Boguski MS.
  • Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Jareborg N, Birney E, Durbin R.

    Substitution matrices for protein comparisons

  • Scores for sequence searches and alignments. Henikoff S.
  • Amino acid substitution matrices from protein blocks. Henikoff S, Henikoff JG.

    Multiple sequence alignment

  • Practical aspects of multiple sequence alignment. Andreas D. Baxevanis. Chapter 8 in the book by Baxevanis & Ouellette.
  • CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Thompson JD, Higgins DG, Gibson TJ.
  • Using CLUSTAL for multiple sequence alignments. Higgins DG, Thompson JD, Gibson TJ.
  • The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG.

    Motif finding

  • A comparative analysis of computational motif-detection methods (PDF file). Hudak J, McClure MA.
  • A web site for the computational analysis of yeast regulatory sequences. van Helden J, Andre B, Collado-Vides J.
  • Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Hughes JD, Estep PW, Tavazoie S, Church GM.
  • Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. McGuire AM, Hughes JD, Church GM.

    Hidden Markov models and protein families

  • Hidden Markov models. Eddy SR.
  • Profile hidden Markov models. Eddy SR.
  • Pfam: a comprehensive database of protein domain families based on seed alignments. Sonnhammer EL, Eddy SR, Durbin R. See also the updates for 1998, 1999 and 2000.

    Ab initio gene-finding methods

  • Computational methods for the identification of genes in vertebrate genomic sequences. Claverie JM.
  • Finding the genes in genomic DNA. Burge CB, Karlin S.
  • Predictive methods using nucleotide sequences. Fickett JW. Chapter 10 in the book by Baxevanis & Ouellette.
  • Prediction of complete gene structures in human genomic DNA. Burge C, Karlin S.
  • Eukaryotic promoter recognition. Fickett JW, Hatzigeorgiou AG.

    Related topics

  • Noncoding RNA genes. Eddy SR.
  • Annotating sequence data using Genotator. Harris NL.
  • Frequent alternative splicing of human genes. Mironov AA, Fickett JW, Gelfand MS.
  • Interpreting cDNA sequences: some insights from studies on translation. Kozak M.
  • Computational methods for the identification of differential and coordinated gene expression. Claverie JM.