Course Materials for BIO/CSE/STAT 597/8F
Spring 2002

End-of-the-school-year party: Tue. April 30, Rm. 327 Thomas, 2:30-3:45.


  • What is BIO/CSE 597F?
  • Nucleic Acids Research 2002 Database Issue
  • Study guide for exam
  • Schedule for class presentations

    Necessary software: Some of the materials will be presented as PostScript and PDF documents, so you will need to handle at least one of these formats. If your system doesn't already have one, you may want to fetch and install a free PostScript/PDF viewer manufactured by Aladdin Inc, called Ghostscript. The Aladdin web page has instructions for attaching it to Netscape or MS Internet-Explorer. Another popular program for reading PDF files is Adobe Acrobat Reader.


  • lecture 2
  • lecture 3 (without preliminary data)
  • supplementary material on SAGE (from NCBI)
  • more supplementary material on SAGE (from NCBI)
  • paper on SAGEmap
  • paper on SAGE data errors
  • SAGEmap Web site at NCBI
  • SAGE "home page" at Johns Hopkins
  • query yeast SAGE data at Stanford


  • introduction to spotted arrays
  • introduction to affy
  • normalization and missing values
  • smoothing and lowess
  • filtering and other transformations
  • homework 1 (due Feb. 6)
  • dimension reduction
  • PCA and plotting
  • more dimension reduction
  • multi-dimensional scaling (Susan Holmes)
  • clustering
  • clustering II
  • clustering III
  • Homework 2: assignment, data and groups,
  • Robert Tibshirani, Guenther Walther and Trevor Hastie. "Estimating the number of clusters in a dataset via the Gap statistic". Here.
  • A. Ben-Hur, A. Elisseeff, and I. Guyon. "A Stability Based Method for Discovering Structure in Clustered Data". Here.
  • K. Y. Yeung, C. Fraley, A. Murua, A. E. Raftery and W. L. Ruzzo. "Model-based clustering and data transformations for gene expression data." Here.
  • F. Bartolucci and F. Chiaromonte. "Clustering of expression data from microarrays: a mixture-based approach." Here.
  • Microarray Gene Expression Database Group website
  • working with a response

    Combining expression data and genomic sequence data:

  • introductory lecture
  • Readings on binding-site clusters: Berman et al. and Krivan and Wasserman
  • Detecting binding-site clusters, a research problem: references
  • Regulatory Sequence Analysis Tools : paper, website, details on k-mer matches and details on spaced dyads
  • PROSPECT: paper and website
  • INCLUSive: INtegrated CLustering, Upstream Sequence retrieval and motif Sampler. website
  • another lecture
  • lecture on DNA sequence patterns
  • Is a given sequence pattern associated with co-expression? paper
  • Other websites: Gibbs Motif Sampler, Motif Sampler, MEME.


  • 2D gel databases: website
  • Database of Interacting Proteins: website
  • Biomolecular Interaction Network Database: paper and website
  • MIPS Database: paper and website
  • Expression data and protein-protein interactions: paper
  • Subcellular location of yeast proteome: paper
  • Expression level and subcellular location: paper