Resources

Bioinformatic Resources

Here is an incomplete compilation of various bioinformatic resources accessible on the web. The major categories include:

databases: genomes, sequences, structures, promoters
tools: sequence search, RNA folding, gene finding, DNA motif finding , multiple alignment, cellular processes
miscellaneous links: on-line tutorials, conferences, public institutions, genomic companies, journals

Genomic Databases

E. Coli Genome Database: most updated and consolidated information about E. coli genes and proteins resulted from both the traditional experimental reseach and computational analysis.
Saccharomyces Genome Database: is a scientific database of the molecular biology and genetics of the yeast S. cerevisiae.
C. Elegans Genome Database: is a repository of mapping, sequencing, and phenotypic information about the C. elegans nematode.
Drosophila Genome Database: is a database of genetic and molecular data for Drosphila, which includes data on all species from the family Drosophilidae.
Mouse Genome Informatics: this database provides integrated access to data on the genetics, genomics, and biology of the laboratory mouse.
Human-Mouse Homology Map: constructed by integrating orthologs curated by the Mouse Genome Database with putative orthologs identified by sequence homology.
Ensembl.Org: is a joint project between EMBL-EBI and the Sanger Centre to develop a software system which produces and maintains automatic annotation on eukaryotic genomes.

Sequence Databases

Genbank: the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
SwissProt: is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.
ExPASy: is the Expert Protein Analysis System proteomics server of the SwissProt. It is dedicated to the analysis of protein sequences and structures, as well as 2-D PAGE.
Pfam: is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. Version 5.5 of Pfam (Sept 2000) contains alignments and models for 2478 protein families, based on the Swissprot 38 and SP-TrEMBL 11 protein sequence databases.

Structural Databases

Protein Data Bank (PDB): international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.
SCOP: created by manual inspection and abetted by a battery of automated methods, SCOP aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.
CATH: is a database of structural domains and is a hierarchical classification of protein domain structures based on class, architecture, topology, and homologous superfamily.
Nucleic Acid Database: assembles and distributes structural information about nucleic acids and is integrated with a DNA-Binding Protein Database.
DPInteract: is a database on DNA-protein interactions in E. Coli. with putative extensions to other organisms.

Promoter/GeneRegulation Databases

TRANSFAC: compiles data about gene regulatory DNA sequences and protein factors binding to and acting through them. Programs are developed that help to identify putative promoter or enchancer structures and to suggest their features.
RegulonDB Database: is a DataBase on transcriptional regulation in E. Coli.
SCPD: the Saccharomyces cerevisiae promoter database provides information on the promoter regions of 6000 genes and ORFs (Open Reading Frames) along with regulatory elements and transcription factors involved.
Eukaryotic Promoter Database: is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.

Sequence Alignment Tools

BLAST: a set of similarity search programs designed to explore all of the available sequence databases in Genbank, regardless of whether the query is protein or DNA.
FastA: compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.
HMMer: performs profile hidden Markov models to do sensitive database searching using statistical descriptions of a sequence family's consensus.
SAM: is the Sequence Alignment and Modeling system, based on HMM and Dirichlet mixtures.

RNA Folding and Stretching

IMB Jena RNA: a veritable compendium of all RNA sites on the web is listed here
RNA/DNA Folding: interactive server allowing one to use state-of-the-art RNA and DNA folding algorithms of the Zuker group to fold query sequences.
Vienna RNA Package: consists of a C code library and several stand-alone programs for the prediction and comparision of RNA secondary structures, both at zero and finite temperatures.
RNA Puller: This server performs quantitative predictions of force-extension curves of RNA or ssDNA molecules. It takes the secondary structure of the molecule fully into account with the exception of pseudoknots. The single-stranded pieces of the molecule are modeled as an elastic freely jointed chain.

Tools for Gene Finding

GenScan: provides access to the program GenScan for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.
GENEID: is a program to predict genes (splice sites, start/stop codons, exon assembly) in anonymous genomic sequences.
GeneParser: is a program for the identification of protein coding regions in genomic DNA sequence.
GeneSCAN: uses Fourier transform of DNA to find coding regions
GeneMark: uses a Hidden Markov Model (HMM) approach to find genes. Although originally written for use on bacterial genomes, there now exists a version that works with eukaryotes.

Tools for Motif Finding

MEME and MAST: programs tailored to discover motifs (highly conserved regions) in groups of related DNA or protein sequences via multiple alignment. Given such a motif, MAST searches for it in the sequence database.
Meta-MEME: an extension of MEME f or building and using motif-based hidden Markov models of DNA and proteins.
Gibbs Motif Sampler: allows one to identify motifs, conserved regions in both DNA and protein sequences.

Tools for Multiple Alignment and Phylogenetics

CLUSTAL-W: is a general purpose multiple alignment program for DNA or proteins.
PHYLIP: is a package of programs for inferring phylogenies using parsimony, distance matrix, and likelihood methods via bootstrapping and consensus trees.

Metabolism & Genomes

KEGG: the Kyoto Encyclopedia of Genes and Genomes is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.
E-Cell Project: interested in building models for simulating intracellcular molecular processes to predict the dynamic behavior of living cells.

Online courses and tutorials:

Many of these links are to past bioinformatics courses. The website contains both the lecture notes and, more relevantly, judiciously chosen problem sets.

Weizmann Institute and also -- Bioinformatics & Computational Genomics course information and tools.
Universitaet Bielefeld heuristic methods for fast database searches: Practical exercises using FastA and BLAST.

Annual Conferences/Symposia

Public Institutions

NIGMS: is the National Institute of General Medical Sciences. Research and links are geared towards basic biomedical research that is not targeted to specific diseases, but that increases understanding of life processes and lays the foundation for advances in disease diagnosis, treatment, and prevention.
DOE-OBER: is the DOE office of biological and environmental research.
NCBI: is the National Center for Biotechnology Information.
Sanger Centre: is a genome research centre founded by the Wellcome Trust and the Medical Research Council. The purpose is to further the knowledge of genomes, particularly through large scale sequencing and analysis.
EMBL: is the European Molecular Biology Laboratory and has links to genomes and computational resources

Genomics Companies

Celera
Incyte
Millenium
Rosetta
Affymetrix
MaxyGen

Journals

American Naturalist
Bioinformatics
Biophysical Journal
Genome Research
Genomics
Journal of Evolution Biology
Journal of Mathematical Biology
Journal of Molecular Biology
Journal of Theoretical Biology
Nature
Nucleic Acids Research
Proc. Nat'l Acad. Sciences
Protein Engineering
Protein Science
Science

Quick Links

Programs & Conferences

Connect with KITP

Search form

Quick Links

Programs & Conferences

Connect with KITP

Resources