Schedule


Thursday AM Thursday PM Friday AM Friday PM Saturday Main website


Thursday AM: The nature of biological complexity

This section will introduce a series of concrete biological problems to motivate and and serve as case studies for the rest of the shortcourse. The main case study will be cytoplasmic heat shock regulation in E. Coli, with metabolic regulation, signal transduction, and chemotaxis as additional case studies.

In a scalable approach to systems biology, both data and modeling assertions and questions must be described in a common framework that is biologically natural, yet can be turned over to powerful algorithms for resolution. Is a model consistent with experimental data, which may come from extremely heterogeneous sources? If so, is it robust to additional perturbations that are plausible but untested? Are different models at multiple scales of resolution consistent? What is the most promising experiment to refute a model? Put in natural terms (which are typically nonlinear, nonequilibrium, uncertain, hybrid and so on), such questions are conventionally viewed as computationally intractable, and biologists are forced to translate into unnatural terms in order to use available algorithms. This is undesirable and potentially unnecessary. A crucial insight is that evolution favors high robustness to uncertain environments and components, yet allows severe fragility to novel perturbations, and this "robust yet fragile'" feature must be exploited explicitly in scalable algorithmic approaches.

The new computational framework presented herein need not handle arbitrary problems (which are indeed presumably worst-case intractable) but only that subset of problems which are biologically meaningful. Biological organisms are highly constrained in that they have not just evolved, but necessarily evolved in ways that are robust to uncertainties in their environment and their component parts. These are extremely severe constraints, not present in other sciences but essential in engineering, which emerge primarily at the network level in both biology and engineering. These robustness constraints play a crucial if often hidden and implicit role in the informal processes that biologists use to reason about their experiments and are central in creating a scalable process of biological inference. A theoretical and software infrastructure that does not explicitly exploit the highly structured, evolved, and "robust yet fragile" nature of biological systems is hopelessly doomed to be overwhelmed by their sheer complexity.

It is critical to understand the organizational principles that biological organisms use to coordinate sensing, signal processing, communication, computation, and actuation into vast regulatory networks to sense and respond to their environments. This involves an understanding of both how molecules combine to create networks, but as importantly, how higher level modules interconnect to create systems-level behavior that may involve a bewildering complexity when viewed at the molecular level (like understanding the computer this document is on as a collection of a billion transistors). It is our belief that predictive modeling of complex systems without understanding their organizational principles is hopeless, no matter how otherwise powerful the algorithms and theory. Thus one focus of the shortcourse will be in sketching our growing understanding of these "organizational principles," and how these connect with the theoretical infrastructure presented later.

A side benefit of a deepening understanding of the fundamental nature of complexity in a general sense is new and more rigorous explanations for long-standing problems in physics associated with complex systems, such as the ubiquity and origin of power laws in natural and technological events and networks, the nature of "phase transitions" in computational complexity, mechanisms for dissipation in nonequilibrium systems, "emergent" complexity associated with "order-disorder" transitions, the structure of turbulence in the sheared flows surrounding highly streamlined bodies, and in various problems in quantum information technology. There will be limited time to explore these directions, but some attention will be given them on Friday afternoon if time permits.

What will be given some attention in this section is the highly structured modularity that characterizes both the vertical and horizontal multiscale organization of biological and technological networks. In particular, this modularity is structured by layers of protocols that specify the interfaces between modules, and in turn structure all aspects of the higher level organization of complex networks. Indeed, the parallels between technological networks and their biological counterparts is striking, and goes far beyond mere metaphor. We will discuss preliminary success in proving that such highly organized, self-dissimilar, and scale-rich, hierarchical structures are necessary requirements for robust and efficient networks, and are not accidents of evolution. They are also, ironically, precisely the opposite of what would "emerge" in collections of components that were at the "edge of chaos," self-organized to a critical point, or "scale-free." Such systems are a theoretical possibility, but are provably incompatible to an extreme extent with the robustness and efficiency required of functioning biological or technological networks.

Three great abstractions of twentieth century science and technology are that 1) in complex systems, the materials and devices can be studied largely independently of the systems levels implementing communication, computation, and control functions, and that 2) electrical, mechanical, and chemical materials and devices and 3) communication, computation, and control can all be studied independently of each other. These abstractions are the foundation of the reductionist program to deal with complex systems, and have had great success in decomposing both the forward engineering of technological systems and the reverse engineering of biological ones into manageable pieces. The limitations of this particular decomposition are becoming more acute, however. While it allowed massive parallelization of everything from theory and experimental research to development and manufacturing, it has created a Tower of Babel where experts within one subdiscipline can rarely have meaningful contact with experts from other subdisciplines, and may even be largely unaware of their existence. For example, everyone now takes for granted the ubiquitous connectivity and flexibility of the Internet, and can easily see the wires, chips, and displays that make up the hardware, but it is rare for nonexperts to be aware of the complex layers of protocols and feedback regulation that makes the Internet's flexibility and robustness possible. As another example, the term "information" is used by everyone, but often has not just different but exactly opposite meanings in, say, communications, computing, or controls.

In contrast, biological cells and organisms organize and integrate communication, computation, and feedback control subsystems into highly organized regulatory networks, and builds them right on top of molecules, with highly integrated nanoscale chemical, mechanical, and electrical materials. This challenge has been deferred while molecular biology studied the device level, and physicians treated systems based largely on systems level experiment and experience. Advanced technologies are increasingly coming to look like truly multiscale biology, and both do so in ways that often fundamentally break the standard reductionist tools at all scales. Thus at least the theoretical and computational needs of biology and technology are converging, in parallel with the striking convergence of the networks themselves. Even within engineering, these are being built and deployed largely using trial and error and evolution by natural selection. This is neither scalable nor sustainable as an approach to either engineering or to the reverse engineering of biological complexity. This shortcourse is most fundamentally about addressing this challenge.

Back to top.



Thursday PM: The challenge of computational complexity

As noted above, the network-level questions that arise naturally in biology have traditionally been considered computationally intractable, since they are typically stochastic, nonlinear, nonequilibrium, uncertain, involve multiple scales, and hybrid (mixing continuous and discrete mathematics), limiting approaches to heuristic and brute-force methods, or to extreme simplification. This section will review the basic elements of computational complexity theory in order to explain, in accessible terms, what is meant by computational intractable or hard, and why it is relevant to biology. This will involve a review of the notions of decidability and polynomial time complexity, and the classes P, NP, and coNP, and the notion of NP hard and NP complete. A deeper question that goes beyond the worst-case classification into complexity classes, comes from the observation that problems within a complexity class, say, P, or NP hard, or even problems that are undecidable, appear to have widely varying practical computational difficulty. To understand this, we must add to the combinatoric complexity concepts of P, NP, and so on, the numerical analyst's concept of ill-conditioning.

A key to our approach for treating biological complexity is the idea that high computational complexity of our algorithms when applied to biological problems implies the existence of fragilities in the original problem statements, models, or the data, and these must be resolved either by refined modeling or new experiments. Put simply, "complexity implies fragility" and robust systems can be tractably modeled and understood, with the proper infrastructure. The fragilities that arise in the experiment, modeling and inference iteration can have a myriad of sources. They may be due to true fragilities in the underlying organism, and their identification is often the principle aim of the research. But they may also be due to artifacts of the process; such as inadequately characterized assumptions, modeling approximations, missing network elements, incorrect network dynamics, and so on. It is essential that the computational tools provide feedback to the steps in the process where such artifacts are introduced, so that they can be resolved through refined modeling or additional experiments. The software infrastructure we are developing is geared towards providing this capability.

There is overwhelming evidence for the conjecture that "complexity implies fragility" and many special cases, and we hope to make rigorous this idea in a very general context. This is well-established for linear programming, which is in P, but remains open in general. Essentially, if a large number of iterations is required to find a feasible point or prove one does not exist, the problem must be ill-conditioned (i.e., highly fragile). We will extend this notion of ill-conditioning to the more general and hard problems in NP and coNP, and make concrete connections between the "robust yet fragile" nature of biological complexity, with the "complexity implies fragility" nature of computation.

These new methods also give an alternative and much richer and coherent picture of hard problems within the classes P and NP than is offered by the recently popularized "phase transitions" approach to computational complexity. This approach says that computational complexity increases in random problems in the vicinity of certain kinds of phase transitions, and there is now a rich and growing literature on this physics-based approach. We will show that it appears to be largely subsumed by the complexity/fragility theory built on the numerical analysts more complete notion of ill-conditioned problems. That is, phase transitions have a loose association with ill-conditioning, but is neither necessary nor sufficient, and it is ill-conditioning only that has a rigorous connection with computational complexity.

Back to top.



Friday AM: New mathematics and algorithms for systems biology

A central element of our new mathematical methods involves blending and augmenting tools from dynamical systems, control theory, and operator theory in such a way that natural biological questions can be recast as statements about real semi-algebraic sets, that is, sets of polynomial (nonlinear) equations. We can formulate the stability, robust stability or model invalidation for nonlinear or hybrid systems under a common Lyapunov-type umbrella, which converts problems in uncertain, dynamical systems into problems involving semi-algebraic sets. Proving such statements is generally coNP-hard, but real algebraic geometry, semi-definite programming, and duality theory from optimization provide new methods to systematically exhaust coNP by searching for nested families of short proofs using convex relaxations. These relaxations are equivalent to the existence of sum of squares decompositions and can be formulated as semi-definite programs, which can be solved in polynomial time.

This will be the most mathematically difficult part of the shortcourse, and presumably sounds quite obscure the first time, but we believe the essential ideas can be made accessible to a wide audience. This approach to hard computational problems completely changes what is possible in a way of direct relevance to biology. For example, not only can we search for short proofs systematically, a lack of short proofs implies, by a generalization of duality, the fragilities described above. Again, this is well-established for LPs, and a full theory for the class NP is still open, but the evidence now is overwhelming.

Thus "complexity implies fragility" is, in computational terms more precisely "dual complexity implies primal fragility." This feedback does not imply that P=NP=coNP, which is unlikely (and we will assume they are different), but that the inference problems within coNP lacking short proofs can be traced to biologically meaningful flaws in models or data for which resolution can then be systematically pursued. This will hopefully be less confusing when explained in detail than it certainly appears to sound when said this tersely. The implications of this are potential quite profound, and though very new, this approach has broad applications in biology, networking, hybrid and nonlinear control, and multiscale physics.

The robustness analysis of nonlinear dynamical models has relied mainly on simulation and analysis methods without scalable algorithms. Since searching through uncertain models is NP hard and thus presumably involves an exponentially large set of simulations, there has been no effective alternatives for systematically conducting robustness analysis for such systems. One simulation produces one example of one time history for one set of parameters and initial conditions, although sensitivity analysis can be used to evaluate local effects of variations. Yet if the system is not robust, there may be a simple proof which is to exhibit an example simulation, but finding this simulation may be hard. In contrast, a robust system may have no simple proof of this fact, as there are only simple counterexamples (simulations) to robustness. Thus simulations can only provide counterexamples (in NP) to robustness, never proofs (in coNP) of robustness. This asymmetry between NP and coNP is as important here as the more familiar one between P and NP. Assuming that they are not all the same, what is needed is an effective (and scalable) method for systematically proving robustness of nonlinear dynamical systems. That this is possible without P=NP=coNP is both remarkable, and the foundation for our approach.

The applications in biology are only part of the greater framework developed largely by Parrilo (link to thesis and website) at Caltech, that combines the Positivstellensatz, a theorem from real algebraic geometry, and the Sum of Squares decomposition. This general setting can provide a systematic methodology to obtain improved bounds on the solution of difficult and well-studied problems. The new bounds are provably never worse than those of standard methods, and in many cases they are strictly better. There are already many extremely well-studied applications where the developed machinery has demonstrated extraordinary potential, beating the gold standard algorithms, such as optimization of polynomial functions (Parrilo & Sturmfels - 2001), robustness ($\mu$) analysis of uncertain linear systems, and combinatorial optimization (matrix copositivity and 0-1 problems such as MAX CUT).

As another example, consider the problem of deciding quantum entanglement (Doherty, Parrilo, Spedalieri - 2002). Using the above machinery, criteria that distinguish entangled from separable quantum states can be designed. The simplest of these tests corresponds to the well-known Peres-Horodecki positive partial transpose (PPT) criterion, and higher order relaxations give tests that are strictly stronger. The results were successfully applied to many low-dimensional states from the literature where the PPT test fails. As a by-product of the criteria, an explicit construction of the corresponding entanglement witnesses were provided. The "dual complexity implies primal fragility" aspect also has important interpretations, in that entangled states with long proofs are fragile and thus not of technological interest. This will be one of the topics covered Friday afternoon.

Back to top.



Friday PM: Multiscale physics and stochastic biochemistry

In addition to the above physics applications, th other main topic covered here is the impact of noise and random fluctuations in large cellular networks, which is extremely important in systems biology, where the networks frequently involve molecules that are present in small numbers whose random fluctuations may be amplified significantly. These fluctuations interact with the network, which ultimately determines the significance of their impact on the cell. This makes deterministic models inadequate to capture such fluctuations and a discrete and stochastic model may be required. The gold standard for simulating the stochastic time evolution of chemically reacting systems is Gillespie's Stochastic Simulation Algorithm (SSA) (Gillespie - 1976), (Gillespie - 1977), (Gillespie - 1992). This is an exact procedure, in the sense that it yields a stochastic realization of the Chemical Master Equation (Gillespie - 1992).

In principle, the SSA could be used to simulate all of the chemical species and reactions but the algorithm inherently captures every detail, which is in general not required for the correct behavior to be determined. Although methods have been proposed (Gibson & Bruck - 2000) to speed up the SSA, by itself it remains far too slow for practical simulation of realistic biological systems. The SSA version of our full DAE heat shock model was prohibitively expensive to simulation. Our preliminary results on the stochastics of the heat shock system were based on using SSA on a simplified model, and analytic treatment of a linearized model with Gaussian noise. An important research direction is to substantially expand the systems for which we can do rigorous simulation and analysis.

The Gillespie stochastic simulation algorithm (SSA) is merely the starting point for our development of new multiscale tools (Gillespie - 1976), (Gillespie - 1977). The Poisson Runge-Kutta framework is particularly attractive because it handles the smallest scale in a sequence of approximations for well-stirred mixtures: Poisson scale (moderate numbers of molecules) -Chemical Langevin Equation (SDE) scale (moderately large numbers of molecules) - Deterministic (chemical rate equation scale -large numbers of molecules) - with a numerical method that is of the same form as methods that can be used for dealing with larger numbers of molecules (Rathinam, Petzold, Gillespie - 2003). We have shown how stiffness manifests itself in the simulation of chemical reactions at both the continuous-deterministic level and the discrete-stochastic level, rendering explicit numerical schemes impractical, and we are developing promising new implicit methods. The focus of this shortcourse will be on describing the current state of the art in stochastic multiscale simulation, and recent progress and challenges identified in collaboration with Gillespie and Petzold, who will present their latest work.

Back to top.



Saturday: Discussions

There will be informal discussions among the theoreticians attending or presenting on the details of the current mathematics and algorithms research. All are welcome to attend.

Back to top.



Return to the main website.

Last updated: Mar 13, 2003.