2.3.2 Constraint-based modelling

Constraint-based models (CBMs) are used in this thesis in Chapters 4 and 5.

They possess a series of characteristics that allow for a network-based analysis of metabolism.

On one hand, they have been very successful in practical applications in the last decade and on the other, they keep a light mathematical structure that facilitates their integration with other mathematical tools such as network theory (Chapter 5) o multivariate statistics (Chapter 4).

They are usually opposed to metabolic kinetic models, which are not used in this thesis.

However, it is useful to at least describe them in general terms and comment their main drawbacks, which make the constraint-based approach more attractive for certain applications.

Kinetics models

The simplest kinetic models are often called macroscopic models (Dunn 2003), because they do not consider the internal structure of the cells.

The are black-box models that transform substrates into products and a certain biomass production.

Frequently, biomass production is coupled with extracellular species (both substrates and products) through a series of macroreactions.

These large reactions lump together many real metabolic reactions.

Monod, for instance, is typical kinetic expression for these macroreactions.

Obviously, these models simplify the cell’s complexity but in some applications such as bioprocess engineering (Niu et al. 2013) are quite popular.

They are simple and do not require a lot of experimental data to be fitted.

However when biomass production is not the only main variable to explain the biological activity of the cell, their predictions tend to fail.

A common example of this would the over-expression or repression of a target gene of interest.

This may have a huge impact on the cell but the model is unable to take it into account.

Structured kinetics models (Nielsen and Villadsen 1992) are the natural evolution of macromolecular models.

Typically, the cell is divided in several intracellular substances that are connected with each other and with the environment.

A series of ordinary differential equations describe the relationships among the compounds, including reaction rates and other kinetic parameters.

In general terms these structured kinetics models are more realistic, more precise and more flexible than macromolecular models.

However there are a number of serious drawbacks to these models: they require more information, information of particular reaction, kinetic mechanisms and kinetic parameters is often lacking.

This final disadvantage (many kinetic parameters are still unknown) is the practical bottleneck in the development of these models.

To avoid this issue kinetic models are usually restricted to particular pathways (Costa et al.2014) or specially studied group of reactions such as the central carbon metabolism (Chassagnole et al. 2002).

In recent years, however, promising attempts of complete cell models have been developed (Karr et al. 2012).

Main features of constraint-based models

Constraint-based models (Figure 2.16) are representations of the cellular metabolism and they all share two fundamental properties:

• The metabolic network of the organism represent the core of the model.

The topology (which reactions consume and produce which compounds), the stoichiometry (the molar relationship among the compounds involved in the reactions) and the directionality (which reactions are irreversible under normal biological conditions) are the basic information that the model contains.

Regarding the directionality, it is important to state that constraining positive values for a given flux does not necessarily mean that the corresponding reaction is irreversible.

It can still be considered reversible, with direct and reverse reactions simultaneously occurring, but with the assumption that the net flux is positive.

• They ignore the intracellular dynamics, assuming a steady state for the internal metabolites (Stephanopoulos, Aristidou, and Nielsen 1998).

Figure 2.16. Basic principles of the stoichiometric modelling approach.

An ODE represents the mass balance of the metabolite pool.

Under the assumption of steady state it simplifies to the general equation.

Typical of constraint-based models are additional constraints that reduce the possible flux space.

The stoichiometric matrix

The metabolic network in an organism can be represented in the form a stoichiometric matrix (Figure 2.17).

It lists the metabolites and the reactions occurring among them.

It is the main feature of the constraint-based models.

Reactions include intracellular (both substrates and produces are metabolites within the cell) and exchange (some metabolite involved is out of the cell).

Exchange reactions, therefore, represent the uptake of necessary nutrients and the production of byproducts.

All this information is contained in the stoichiometric matrix, which has m metabolites and n reactions and takes the form of a mxn matrix S, in which rows match metabolites and columns reactions.

The stoichiometric matrix S is formed by the stoichiometric coefficients or the reactions that form a metabolic network.

Each column contains the elements that participate in the corresponding reaction (reaction participation) and their stoichiometric coefficients.

Figure 2.17. A simple example metabolic network.

Nodes represent metabolites, links correspond to fluxes of metabolic reactions v and arrowheads the reaction directionality. Reaction v4 is reversible. Fluxes v4, v5 and v6 exchange mass with the environment.

Besides, they must obey the rules of chemistry, such as mass and charge balance.

Every row describes all the reaction in which the metabolites participate (usually called metabolite connectivity), and therefore how the reactions are linked (which reaction produce metabolites that are substrates for other reactions).

For large networks, the stoichiometric matrix has zero in most of its elements.

Being a sparse matrix has implications in the computational procedures, specially when reaching genome-scale levels.

For a detailed mathematical description of S see Chapter 9 in Palsson 2015.

Main principles in constraint-based models

Once the stoichiometric matrix has been defined, the mass balances involving intracellular metabolites can be mathematically described by a set of ordinary differential equations (Llaneras and Picó 2008):

where c = (c1, c2, ..., cm) is the vector of intracellular metabolite concentrations, v = (v1, v2, ..., vn) the vector of fluxes and μ the specific growth rate of the cells.

This equation represents the dynamic mass balance, and therefore describes how the concentration of each metabolite ci changes over time.

To solve this equation information about stoichiometry (S), reaction fluxes (v) and cellular growth (μ) is necessary.

As it was previously stated, in stoichiometric models the dynamics of intracellular metabolites are ignored under the assumption that there is an intracellular global steady state.

This assumption is corroborated by the observation that intracellular dynamics are much faster than extracellular so it is reasonable to assume that intracellular metabolites reach a steady state very quickly.

Furthermore, the dilution term μ • c is also disregarded it is commonly much smaller than the intracellular fluxes affecting the corresponding metabolites.

Under those two assumptions, the total mass balance of the cell can be mathematically represented by Equation 2.6, routinely called general equation.

Equation 2.6 defines the space of possible flux distributions v.

Although it reduces significantly the number of metabolic states that feasible, it does not predict the one that is actually occurring.

Obviously, different carbon sources or oxygen availability produce metabolic states wildly different from each other.

This matrix can be translated into a system of equations with m independent equations.

Being n normally larger than m the system is undetermined with n-m degrees of freedom.

Additional information is needed to reduce even more the space of feasible solutions to a biologically useful set.

Constraint-based approach

The basic idea behind constraint-based models in that cells are subject to a series of constraints that limit their possible behaviour.

Reaching a complete knowledge of all the constraints that control the cell’s features is a long term goal.

However, nowadays it is possible to enumerate a set of physical, chemical and biological constraints that reduce a great deal the possible flux space of a metabolic network.

These constraints, added to classical stoichiometric constraints of Equation 2.

6 start to define a very limited space of feasible metabolic flux distributions.

From that poing of view, stoichiometric modelling based on the general equation may be viewed as a particular branch of constraint-based modelling that only considers sotichiometric constraints.

Since the flux distributions define the metabolic phenotypes of the cell, this space contains all the feasible phenotypes (Edwards, Covert, and Palsson 2002).

Typical constraints range from thermodynamics (e.g. irreversibility of fluxes) to enzyme capacities (which limits a maximum flux).

It is very common to have at least some reaction fluxes measured, which constrains the flux space even more.

These constraints are mathematically represented in Figure 2.16 under Additional constraints.

Additional regulatory constraints (Covert, Schilling, and Palsson 2001; Lerman et al. 2012; O’Brien et al. 2013) may be included as well but are disregarded in this thesis.

Figure 2.18. Space of possible steady-state flux distributions. Each figure is bounded

by consecutively more restricting constraints.

The general equation, Equation 2.6, procures a set of purely stoichiometric constraints that relate the fluxes with each other and reduce the space of possible flux distributions to a hyperplane (see Figure 2.18),

subspace of Rn where each axis represents a particular reaction flux in the network.

After that, irreversibility constraints are usually added.

These are codified as reaction which are assumed to flow in only one direction, thus making the flux through those reactions always positive (as in Equation 2.7).

Ultimately, maximum flux values derived from enzyme or transporter capacities can be defined as well (Equation 2.8).

If this data is available for the flux of every reaction in the network, then the flux space becomes bounded.

In mathematical terms, the space of solutions is shaped as a bounded convex polyhedral cone (see the last panel in Figure 2.18). Equations 2.6, 2.7 and 2.8 embody the most common constraints.

They are the only constraints used in this thesis.

These three types of constraints define a space of solutions that the complete flux distribution of the metabolic network of a cell always inhabits.

Therefore, together they form what is popularly called a constraint-based model, which will be one of the basic object of study in this work.

The three methodologies that were used to study these models are known as FBA, FVA and MFA.

These are explained in detail in Chapter 4.

The biomass reaction

Arguably the most important reaction in many CBMs is known as the biomass reaction (Feist and Palsson 2010).

This reaction represents the cellular growth and works draining precursor metabolites from the network at stoichiometrically fixed relative rates while producing some by-product metabolites.

These precursors are used to produce lipids, proteins, nucleic acids and other macromolecules necessary for the cell’s growth.

Besides generating macromolecules, significant energetic requirements exist for ensuring cellular replication and growth.

These requirements are usually divided in growth associated and non-growth associated.

To represent the former, ATP is converted to ADP as a part of the biomass reaction.

Nongrowth associated maintenance is not part of the biomass reaction in most CBMs.

Instead, it is represented as the lower bound of another drain reaction, the ATP maintenance reaction (ATPM), that simulates the consumption of energy the cell undergoes only for staying alive.

Biomass and ATPM reactions are codified in the model as additional columns in the stoichiometric matrix S.

Information about the biomass reaction for the specific E. coli model used in this thesis is given in Chapter 4.

Brief history, uses and applications of CBMs

CBMs have their origin as metabolic network reconstructions.

Early reconstructions were small and they only included the most basic reactions of the central carbon of the organisms they were representing.

Probably their actual shape as CBMs has its origins in a series of papers dated more than twenty years ago, specially in Varma and Palsson 1993a.

Nowadays there are genome-scale CBMs for hundreds of different organisms.

Most of them are organized in the BIGG database (King et al. 2015a).

Popular examples are the RECON 1 reconstruction of the Homo Sapiens metabolism (Duarte et al.2007) and the latest iteration of the E. coli (str. K-12 substr. MG1655) genome-scale CBM, called iJ1366 and published in Orth et al. 2011.

As it was stated before, these models contain information about the reaction stoichiometry, reversibility and capabilities but they also include the relationships between genes, proteins and reactions.

Constructing a genome-scale CBM is a long, time-consuming process.

This process is usually divided in four consecutive steps (Orth, Fleming, and Palsson 2010):

• First, the organism’s annotated genome is used to build a draft reconstruction.

• Second, this draft reconstruction is curated through a long process that comprises the analysis of many specific experimental data.

• Third, the reconstruction is translated into a proper mathematical model, usually called constraint-based model (CBM).

At this point model simulations can already be compared with real phenotypic data.

• Finally, in a fourth step high-throughput data such as fluxomics, metabolomics, proteomics or transcriptomics can be used to refine the model even further.

There have been many practical uses to constraint-based models.

Some of these studies encompass:

• Bacterial evolution

• Gene deletion and horizontal gene transfer

• Adaptation to new environments

• Evolution to minimal genomes

• Identification of optimal network states

• Determination of groups of coupled reactions

• States of regulatory networks

• Linking phenotypes and genotypes, mainly through the prediction of cellular growth

• Discovery of unknown biological features

• Applications in metabolic and bioprocess engineering A comprehensive enumeration of the most popular applications of CBMs in E. coli can be found in Feist and Palsson 2008 (see Figure 2.19).

In general terms, CBMs have reach such a spectacular success because of their ability to make good predictions with small amounts of experimental data.

The turning point in the scientific community occurred in the first years of the present century when CBMs started to be regarded as a useful tool and not an overly simplified model of metabolism.

Around that time the seminal paper Edwards, Ibarra, and Palsson 2001 was published and served as validation of the whole approach.

Figure 2.19. Main applications of constraint-based modelling (from Feist and Palsson 2008).

2.3.2 Constraint-based modelling