Infobio01 - Abstract and Background

[Back to Program Homepage]

interface with physics focus and outline

Background

Rapid advances in technology have generated a large amount of biological data in recent years, with a much larger amount expected in the near future. Examples of accessible data include the complete genome of 40 or so organisms from bacteria to human, the 3D spatial structures of over 10,000 proteins, and the expression profiles of thousands of genes at a time. Other information soon to be available in large quantities include multiple copies of the genome of an individual species (polymorphism), genomes of closely related species (e.g., mouse/human), as well as a rapid accumulation of data on protein-protein and protein-DNA interactions.

Much work is needed to turn this rich and massive amount of data into useful biological knowledge, eventually making biology a quantitative, predictive science. The major tasks at hand are (i) identify the molecular players (e.g., genes and proteins) and their functions,  (ii) characterize their mutual interactions, and (iii) extract the collective properties of the biological networks they define. Faced with the rather incomplete current understanding of the physical mechanisms underlying the complicated biological processes essential to information processing in a cell (e.g., DNA transcription,  RNA splicing, and protein folding), informatic methods have been very helpful in accomplishing some of the objectives, e.g.,  identifying the loci of the genes and the functions of unknown proteins, by exploiting the available molecular and genomic databases. Biological insights together with creative manipulations of these databases have even led to successes in characterizing various protein-DNA and protein-protein interactions in a number of cases. Bioinformatics is rapidly becoming an important discipline for new advances in biology.

There remain plenty of challenges in bioinformatics and computational genomics in this "post-genome" era. They range from technical issues such as characterizing the statistics of datamining tools, to conceptual issues of extracting functional modules in complicated molecular networks, and the subsequent analysis of their stability, robustness, and evolvability (see below). The major goals of this program are to define the important theoretical issues relevant to post-genome biology, identify the problems that are ripe now or in the near future, and evaluate new theoretical methods/approaches that need to be developed for these biological problems. These will be accomplished through formal and informal discourses between biologists and scientists from various quantitative disciplines.

Interface with physics

Progress in bioinformatics is intimately related to one's understanding of a number of generic and conceptual statistical issues, involving for example optimization, partitioning, and pattern recognition. Many of these issues can be cast into the framework of statistical physics. For example, the dynamic programming approach commonly used to predict RNA secondary structures was already exploited by De Gennes over 30 years ago. Also, the sequence alignment method most widely used by biologists has been a flourish subject of research in statistical physics for the past 20 years, in the contexts of vortex lines pinning in random media and stochastic surface growth. Moreover, several decades of studies in magnetism and percolation problems have led to valuable insights and techniques towards "clustering'', another central topic of bioinformatics. These examples are not isolated coincidences. Rather, they reflect the common theoretical themes shared between bioinformatics, which is concerned with extracting information from large dataset, and the defining characteristics of modern statistical physics,  that of finding order in complex physical phenomena involving many degrees of freedom. It is therefore potentially very  profitable to explore the application of the ideas and methods of theoretical physics to the outstanding issues of bioinformatics: Bioinformatics may arguably present the first real opportunity where theoretical physicists can make a direct, tangible impact on issues of importance to biology, while drawing on the strength of their knowledge in theoretical physics !

From a different perspective, biological systems offer concrete examples of  "complex'' systems, the study of which has attracted the attention of much of the statistical physics community for the past two decades. Through billions of years of evolution, molecular biology offers an amazing degree of complexity at many different scales: At the smallest scale for example, there is the folding of individual biopolymers (e.g., proteins and RNA) resulting from the collective interaction of the different monomer units. At a larger scale, one has the interactions of proteins with each other and with the DNA's and RNA's, forming the molecular basis of genetic and protein networks. These molecular networks control many aspects of the biology of a cell as well as collections of cells. At the largest scale, co-evolution of species generates some of the most interesting examples of nonequilibrium dynamics.

One central issue transcending all of these phenomena concerns the design principle: In protein folding, one wishes to identify what is special in the primary sequence that enables a protein to behave so differently from a random string of amino acids; in molecular networks, one wishes to identify what is special in the topology and/or interaction parameters that make a network do useful computation. Early research in protein folding has yielded secondary structures such as the alpha-helices and beta-sheets which greatly simplify structural representation (compared to the enumeration of all atomic coordinates). Can we describe the molecular networks analogously by a reduced number of functional modules (instead of merely listing all the known chemical reactions) ? Are there meta-rules govening the assembly of these modules analogous to the existence of effective interaction potentials between amino acids (without reference to the atomic interactions) ? Many protein molecules and molecular networks in the cell are involved in information processing; as such, they cannot be too stable in one state, and need to be able to switch rapidly but reliably among different states. How can such systems be robust to unavoidable fluctuations in a vast number of environmental factors ? However the robustness of a network may be achieved, it needs to be evolvable over evolutionary time scales in order to cope with long-term changes in the environment. This dichotomy between the stability of a network over short time scales and "plasticity" over long time scales is also a central issue in the theory of learning intensively studied in computational neural science.

On a different note, recent  development of novel experimental methods in molecular biology is also making real-time molecular evolution in the laboratory a reality. The possibility of direct "molecular breeding'' of DNA, RNA and proteins tailored for specific tasks should provide plenty of raw materials to promote advances in the field of dynamical systems, evolution theory, and game theory. On the other hand, theoretical understanding of the evolution process will be needed to guide the design of  "breeding principles" used for different applications. There are also close links between molecular evolution and the bioinformatics problems described above. For example, Andelman's "DNA computer'' can be viewed as an example of molecular evolution, where the correct answer to an optimization problem is "bred" via the process of  DNA-DNA interaction.

Program focus and outline:

This program aims to bring  physicists together with biologists and computational biologists, to provide a medium such that physicists can learn from biologists of the outstanding issues of importance to biology, and explore together with biologists and computational biologists to better detect, understand, and manipulate biological information. Main topics of the program can be divided into the following 3 focus groups:

  • Bioinformatics: sequence comparison; gene finding; protein structure prediction; DNA and protein motif detection; gene expression profile clustering; and phylogenetic tree reconstruction.
  • Molecular interaction and networks:  RNA and protein structure and function; DNA/RNA-protein, protein-protein, protein-ligand interaction; recombination, mutation, and repair; protein, genetic, and immune networks.
  • Evolution: chemical evolution;  RNA and protein breeding; viral and bacteria evolution; and co-evolution of interacting species.

The boundaries of these topics are of course quite blurred. As can be seen from the outline below, a number of subtopics are inter-mixed and overlapped, to facilitate cross-fertilization. Indeed, providing a broader perspective of the above topics is anonether major goal of this program:  New insights and inspirations can be gained by discovering common theoretical themes. Also, methods useful in one area may be transferable to another. Thus, to attain maximal benefit from the program, every participant should be prepared to be both a student and a teacher. The format of the program will be rather relaxed, with at most one informal seminar in the morning to initiate the daily discussions. A detailed week-by-week schedule of topics and speakers will be updated periodically.
 

0. tutorial of basic molecular and evolution biology (Jan 16  -- Jan19)

I. sequence analysis (Jan 22 -- Feb 2)

  • sequence evolution and homology;
  • sequence alignment: methods and applications;
  • null model, fidelity, and statistical significance; large deviation theory;
  • hidden Markov models and its applications.

II. RNA folding (Feb 5 -- Feb 9)

  • RNA secondary structures: energetics, dynamic programming, thermodynamic properties;
  • pseudo-knots and tertiary structures: energetics and computation;
  • natural RNA: phylogenetic method, thermodynamics, and kinetics;
  • ribozymes: structure and function.

III. protein folding (Feb 12 -- March 2)

  • introduction to protein structure classification: folds, domains;
  • experimental studies: folding kinetics, thermodynamic stability;
  • theoretical studies of toy models: phase diagram, energy landscape, and kinetics;
  • ab initio simulation;
  • stucture prediction, e.g., profiles/motifs, threading;
  • structural genomics;
  • de nouvo design.

IV. genomics (March 5 -- March 30)

  • intron/exon, splicing, and gene finding;
  • transcription regulation, termination; DNA motif identification;
  • RNA-protein interaction: alternative splicing, editing, degradation; RNA interference;
  • chromosomal structure and organization;
  • recombination; mutation and repair mechanisms;
  • gene expression: DNA microarray and clustering analysis;
  • comparative genomics.

V. molecular networks (April 2 -- April 20)

  • molecular interaction: protein-protein, receptor-ligand, antibody-antigen;
  • protein networks, e.g., signal transduction, chemotaxis;
  • gene networks: natural and synthetic;
  • immune networks;
  • theory and modeling:
    • extraction of network elements from data
    • identification of functional modules
    • properties of modules/networks: e.g., stability, robustness, evolvability

VI. molecular evolution (April 23 -- May 4)

  • in-vitro evolution: RNA and protein breeding, DNA computing;
  • theoretical studies of RNA evolution: quasi-species, neutral network;
  • chemical evolution: autocatalysis, replicators, the RNA world;
  • the genetic code: structure, constraints, optimization;

VII. molecular phylogeny (May 7 -- May 25)

  • overview of the tree of life;
  • multiple alignment and tree building;
  • tree vs network;
  • comparative genomics and reconstruction of natural evolution
    • genetic processes: mutation, recombination, gene conversion, and horizontal transfer;
    • variations in rate of evolution: across genes, lineages, and along the genome;
    • effect of selection: Darwinian competition within population;
    • the most recent common ancestor.

VIII. ecology and macroevolution (May 28 -- June 15)

  • viral and bacteria evolution: expt and theory;
  • sexual reproduction and speciation;
  • interaction in multi-species community: symbiosis, predator/prey, foodweb.

[Back to Program Homepage]