Over the last two decades, formulation of formal (often mathematical) concepts and computational modeling have become more and more familiar in biology, although they have been applied much earlier. Leonardo of Pisa used in his book from 1202 the Fibonacci numbers to describe the growth of a rabbit population. Lotka and Volterra characterized population dynamics of lynx and hares (predators and preys) with their famous equation in the 1920s. Michaels and Menten developed a model to determine the rate for kinetic reactions in 1913. In the mid of the last century, Ludwig von Bertalanffy became famous for his work on biophysics of open systems, thermodynamics of living systems, and the introduction of the notion of “Fließgleichgewicht,” roughly translated with dynamic equilibrium.
Since that time, many different approaches have been introduced and applied for in-depth understanding of biology, to relate independent biological observations to each other (what has the protein content measured in one experiment to do with the mRNA amounts measured independent of another technique?). Concepts from physics and engineering have fruitfully invaded biological research, one example being concepts of control systems for gene regulation. Biology is on its way to a quantitative science like physics, chemistry, or geography. It requires numbers and mathematical predictions to complement biological measurements and to derive useful predictions from them. We can compare it, in some sense, with the weather forecast: Data are combined with models and evaluated by computers to predict tomorrow's weather. The forecast is often not precise, but getting better. However, biology is much more complicated with different species, different cell types making a higher organism, and processes of development and evolution. Nevertheless, many promising results indicate that it is worth trying to provide a mathematical description of biological systems.
Here, we will give a short and cursory introduction to model concepts and how to formulate a first model for the process of interest. More detailed explanations of construction and analysis procedures for models follow in the later chapters.
Understanding a biological system is not a straight, unidirectional process. It requires understanding of its topology or the structure of the system. It involves analyses of its dynamical behavior and the control mechanisms at play. Also, it requires interpreting how its design and function relate one another in the overall context. The major guiding principles for model building are as follows: What question is the model supposed to answer? Is it built to explain a surprising observation? Is it built to relate separate observations with each other and with previous knowledge? Is it built to make predictions, for example, about the effect of specific perturbations? Often, a major function of models is to make assumptions about the underlying process explicit and, hence, testable. Only if we have a formal description of the system at hand, we can prove formal relations or falsify certain assumptions.
A mathematical model of a biological system can describe very different aspects and, hence very different approaches are employed. Let us consider the following three major types of computational models, which are further explained below:
Network-based models describe and analyze properties, states, or dynamics of networks, that is, components and their interactions. Typical and frequently used network-centered modeling frameworks are as follows:
In classical cases, the model has instances for a set of compounds, for example, genes, RNAs, proteins, or small molecules contributing to a process. These compounds can be represented simply by a node or by a set of variables describing their amount or activity or states. The mathematical description suitable for the variables depends on the chosen modeling framework. The topology of the network, that is, the ensemble of all compounds (or nodes of the network) and interactions between them, can be derived from different experiments to identify interactions. These experiments range from classical biochemical methods to more recent high-throughput techniques such as protein–protein interactions by yeast two-hybrid analysis, DNA–protein interactions by chip-on-chip, or gene coexpression profiling (a survey on experimental techniques for systems biology is given in Chapter 14). Since a considerable amount of data is available in publications, an alternative or addition to new experiments is exhaustive literature research, including text mining as well as systematic screening of databases.
A sensible description of the network topology already allows a set of topological analyses: Are compounds densely or loosely connected? Do we have single important hubs or are all nodes equally well connected? What is the shortest path to get from one node to another node? These questions are answered in Chapter 8. They can provide an understanding of how information is transmitted in a network, how robust it is against changes of its topology (e.g., by knockout of a node or cutting of a connection), or how it may have emerged and changed during evolution.
Based on the completed network topology, it is possible to add more detailed information about the nature of the connections or the nodes. This can be kinetic laws for individual reactions or instructions for combining input information arriving at a node from different edges as in Boolean networks. Again, the type of additional information to add depends on the choice of the model framework, which in turn is largely determined by the question that the model is supposed to answer.
Concerning the choice of modeling framework, we have a number of alternatives. The most important ones are the following:
Rule-based models or agent-based models represent a different approach of modeling biological phenomena (and, of course, also phenomena in other areas). In rule-based models, every compound of the system can update its state according to a set of rules. For example, a rule could express that protein X is phosphorylated if its kinase Y is activated and its inhibitor Z is not present. Rule-based modeling lists all potential state changes of the individual compounds, but not all potentially occurring states. That is why it can be computationally less demanding than an ODE system, for example, for signaling systems, where compounds may have many degrees of phosphorylation or involve in various complexes. Cellular automata are regular grids of cells (here cells in the sense of abstract location or unit, not of a biological entity), which can assume a finite number of states. States are updated based on rules that depend on the states of neighboring cells. Cellular automata can create complex phenomena, including the classical Game of Life. More complex are the agent-based models, in which all individual compounds such as proteins or cells are considered as autonomous agents with their own rules. The agents move freely in the containing space and update their states according to their rules and the environmental conditions. They can be used to describe many processes such as (i) signaling pathways in models accounting for spatial and structural properties of the cell or (ii) cell states in models describing differentiation or (iii) the interplay of different cells of the immune system or (iv) the interaction of parasites with the occupied host tissue.
In the light of massive data production by the different omics technologies, statistical models are very important in systems biology. Statistical models establish relations between measured data and provide a guide to extracting underlying structures of the biological system that gave rise to the data. Examples are the detection of linear or multilinear relationships such as linear regression or ANOVA. One may also perform a comparison of the measured data distribution with well-understood distributions such as binomial distribution, Gaussian distribution, Poisson distribution, and others. Clustering of measurement data is used to find groups of data that behave similarly in some aspects (e.g., in their dynamics). It is frequently applied for the output of networks such as gene expression data and may reveal common regulation patterns. Other methods have been developed to categorize data. An example is support vector machines, which are learning algorithms that are supposed to divide a number of objects into classes with maximal distance. These classes may or may not reflect biological mechanisms or structures, but often provide a good indication of underlying principles.
If you want to build a theoretical model for a biological process, you would like to use the following short recipe:
If model predictions and the new experimental data are in agreement, it indicates that the model may have covered correctly important aspects of the described system and can be used to make further predictions. Models are never “right,” but can be appropriate and helpful. If predictions and experimental tests differ, it may be even more interesting. It tells you that important aspects of the biological process have not been understood correctly, not been presented appropriately, or have been completely ignored. The model is falsified (at least in its current state), leading to an interesting journey of finding missing links, alternative explanations, or better parameter sets to explain the observations.
To formulate an ODE model for a dynamic biochemical reaction network, we need the following information:
On the basis of the well-formulated model, we can perform different analysis steps:
To illustrate the steps introduced above, we consider two little models: one for a metabolic network and the other for a small regulatory network.
Metabolism serves the uptake of nutrients to convert them into energy, mostly in the form of ATP, and into building blocks such as amino acids and lipids. All metabolic reactions are catalyzed by enzymes. The metabolic network in Figure 2.1 resembles the first steps in glycolysis, a major catabolic pathway for the uptake and initial phosphorylation of glucose, which is afterward distributed to other catabolic and anabolic pathways to provide building blocks and energy. Let us call extracellular glucose S0 and its concentration S0. Intracellular glucose is S1, singly phosphorylated glucose (glucose-6-phosphate, G6P) is S2, and doubly phosphorylated fructose (fructose-1,6-bisphosphate, F16P2) is S4. G6P is provided for other synthetic pathways producing S3. Phosphorylation is carried out by consuming ATP (A3) and converting it into ADP (A2).
We can describe the dynamics of the network in Figure 2.1a by a set of ordinary differential equations as follows:
When we assign kinetics to the reactions, we can simulate the system. A simple choice of kinetics is mass action kinetics, where the reaction rate is proportional to the concentration of its substrates:
You can find more information on kinetic laws in Chapter 4. We will now use this set of equations to simulate the dynamic behavior of the network. Starting with an ATP concentration of 1, an ADP concentration of 0, and zero concentrations of the internal sugars (S1, S2), we find that ATP is consumed and ADP is produced. S1 is produced in an unlimited fashion through the uptake reaction v1, but S2 is produced only as long as ATP is available, afterward it declines. Since there is unlimited supply of S0, the system has no steady state, that is, no state with .
A steady state with can be obtained if we consider that ATP and ADP are kept constant by other cellular processes, that is, . Then we can consider them as external variables, as shown in Figure 2.1c with the dynamics represented in Figure 2.1d.
If S0, S3, and S4 were internal variables that can change (, , , respectively), then the mass provided by S0 remains within the system and it approaches a state where S0, A3, and S2 decline to 0, while A2 reaches 1, S1 and S4 approach about 0.2, and S3 about 0.6. However, the model will never reach a true steady state (Figure 2.1e and f).
Stem cell research is of increasing importance in biological research and of great interest in health care. Besides many other promises, it provides the hope that in the future many diseases can be cured by administration of healthy cells of a specific tissue to a diseased person that have been grown out of reprogrammed induced pluripotent stem cells (iPS cells) originating from the same person. The three genes (and gene products) considered as the main regulators of stemness of cells are Oct4, Sox2, and Nanog. They activate each other, but they are controlled by epigenetic marking and by growth factors. Cellular differentiation is accompanied by hypermethylation of their promoters and by downregulation of their gene expression. In order to create iPS cells, many experimental procedures have been tested. The addition of viral plasmid containing four factors – Sox2, Oct4, c-Myc, and the microRNA Klf4 – that was introduced by Takahashi and Yamanaka in 2006 [2] was most successful.
Here, we will use a simple model to study some basic properties of that system. Let us first assume that Oct4, Sox2, and Nanog stabilize each other. Their expression is suppressed by the epigenetic marking (the DNA methylation), but the proteins also prevent DNA methylation (see Figure 2.2). This mutual inhibition can be described on different levels of detail (e.g., including the joint stabilization of the proteins or not). We will use the following differential equation system that focuses on the collective effect of stemness markers on epigenetic marking and vice versa, that is, the double negative feedback (resulting in a positive feedback) that each component has on itself:
OSN denotes the common activity of the stemness markers Oct4, Sox2, and Nanog. EM represents the level of epigenetic marking. The activity of OSN increases linearly, but EM inhibits it in a fashion described with a Hill function (see Chapter 4). Its basal degradation is proportional to its current concentration. The respective rules hold for EM with OSN as inhibitor. The behavior is illustrated in Figure 2.2. It shows that the system has three steady states. One steady state is unstable, the other two steady states feature either low levels of epigenetic marking and high expression of the stemness factors, indicating that the cell is a stem cell, or high levels of epigenetic marking and low levels of Oct4, Sox2, and Nanog, indicating differentiation. In isolation, the system will always reach one of these states, depending on the initial conditions, and then remain there. It can only be moved out of this state by external cues. Under natural conditions, stem cells are forced into differentiation by external signaling compounds such as Wnt or growth factors. When trying to reprogram cells away from the differentiated state toward induced pluripotency, the strategy introduced by Takahashi and Yamanaka favors OSN through the expression from a viral vector. This has the effect that these four compounds are expressed and active, for example, in regulating epigenetic marking, but they are not under epigenetic regulation themselves. If their expression is strong and long enough, they push the cells back into conditions featuring pluripotency with high expression also of endogenous stem cell factors.
Cellular reprogramming with viral vectors has provided many opportunities to study the reprogramming process in detail and determine the contribution of individual regulatory mechanisms, such as cell cycle progression. It is less suited for long-term application in human patients, which is an interesting medical aim, since it implies using viral material and since the uncontrolled expression of pluripotency factors can also lead to unintended side effects such as cancerogenesis. Hence, the search for alternative ways for reprogramming is ongoing, for example, by using small molecules.