5.5.1 Graphs of metabolic networks: constructing graphs that incorporate directionality and context
The objective is to create graph descriptions of metabolic models in which the connections among the reactions have a clear interpretation, and can thus be interrogated with a wide range of tools from network science.
To illustrate the construction of the different graphs described in this chapter, a toy example (Figure 5.2a) of a metabolic network that describes nutrient uptake, biosynthesis of metabolic intermediates, secretion of waste products, and biomass production (Rabinowitz and Vastag 2012) is used, all of which are key components of genome-scale metabolic models.
Let’s consider metabolic networks composed of n metabolites and m reactions:
The numbers ij and ij are the depletion and production coefficients of the
species in each reaction. The evolution of the metabolite concentrations follows
the differential equation
where x(t) is an n-dimensional vector of metabolite concentrations xi(t) and v is an m-dimensional vector of reaction rates.
The n × m matrix S is the stoichiometric matrix with entries sij = _ij −_ij , that is, the net number of xi molecules produced (sij > 0) or consumed (sij < 0) by the j-th reaction.
Due to chemical or thermodynamic constraints, some reactions in v are known to be irreversible (i.e., vj _ 0 for certain reactions Rj ).
This information can be summarised in an m-dimensional reversibility vector r with entries
A unipartite graph with m reaction nodes can be described by its m×m adjacency matrix with entries that represent the connection between nodes i and j.
In contrast, a bipartite graph has two types of nodes and connections only occur between nodes of different type, resulting in an n × m adjacency matrix.
One of the simplest ways (Palsson 2006) to construct a graph from a metabolic model is
This graph connects metabolites to reactions based on whether metabolites participate on a given reaction, either as reactants or products. Figure 5.2B shows the bipartite graph associated to the toy model.
The most common approach to construct a unipartite graph of reactions from bS is via the m×m adjacency matrix
In the graph A, two reaction nodes are connected if they share metabolites as reactants or products (Figure 5.2b), while self loops represent the total number of metabolites that participate in a reaction.
Though widely studied (Palsson 2006; Wagner and Fell 2001), the graph A suffers from three key limitations: (i) It does not distinguish between forward and backward flow of metabolites between reactions (its adjacency matrix is symmetric), and hence it cannot incorporate information on the reversibility of reactions.
(ii) It does not distinguish between connections caused by important metabolites, such as carbon sources or metabolic intermediates, and connections caused by pool metabolites that participate in numerous reactions, such as water, ions or enzymatic cofactors.
As a consequence, A has a large number of connections that obscure its structure and the connectivity between its reaction nodes.
(iii) It is a rigid description that does not account for varying cellular contexts such as changes in carbon sources, growth conditions or environmental shocks.
Figure 5.2. Different graph representations for metabolic networks (caption
continues in the next page).
Figure 5.2. (a) Toy model of a metabolic network describing nutrient uptake, biosynthesis of metabolic intermediates, secretion of waste products, and biomass production (Rabinowitz and Vastag 2012).
Reaction R4 is reversible and reaction R8 represents biomass production defined as R8 : x3 + 2x4 + x5.
Associated stoichiometric matrix to the model (S).
(b) Direct construction of graphs from the stoichiometric matrix of the toy model.
The bipartite graph is defined by the boolean matrix bS.
By splitting reactions into their forward and backward rates, the production (S+2m) and consumption (S−2m) bipartite graphs are built.
(c) The traditional undirected reaction network (Palsson 2006) can be constructed from bS.
The proposed Probabilistic Flux Reaction Graph (Dp) is an edge re-weighted version of D (see text) developed to equalize the importance of each metabolite.
Finally, the Flux-Balance Graphs (Mv) balance the weight of each metabolite according to the FBA reaction flux distributions.
Annex 7.4 includes all the matrices to create these networks.
(d) Three examples of Flux-Balance Graphs (Mv) for the toy example.
The graphs were constructed with equation (5.15) applied to the stoichiometric matrix shown in Figure 5.
2a and with three solutions from FBA under different constraints.
In contexts 2 and 3, the lower flux bounds for reactions 6 and 7 were perturbed, causing changes in the flux distribution that translate into different weighting of the graph edges.
The dashed boxes represent the flux bounds employed in each case, and solid lines describe the optimal fluxes predicted by FBA in units of mmol gDW•h .
In all cases, the optimal flux for reaction R4 is positive and thus the corresponding backward reaction node R4 is disconnected.
To address the limitations of the graph A, the construction of two types of graphs that can be applied to any stoichiometric model is proposed: • A directed graph in the absence of context, yet capturing flows, that penalises the connections caused by pool metabolites and that has a concrete probabilistic interpretation.
The proposed graph that meets those conditions is the Probabilistic Flux Reaction Graph (PRG or Dp).
• A family of metabolic graphs that are context-dependent and that re-balance the edge weights according to flux distributions predicted by Flux Balance Analysis (FBA).
The proposed family of graphs are the Flux-Balance Graphs (FBG or Mv), which depend on a vector v of fluxes obtained through FBA.
In both graphs, directionality of reactions is included by redefining the connections between reaction nodes, namely: two reactions are connected if one produces a metabolite that is consumed by the other.
This definition leads to graphs that naturally account for the reversibility of reactions and allow for a seamless integration of different biological contexts modelled through FBA.
Inspired by techniques for the analysis of kinetic models for biochemical reaction networks (Chellaboina et al.
2009), one first decomposes the reaction vector into its forward and backward rates as v = v+ − v−, where v+ and v− are both nonnegative.
For irreversible reactions, the backward rates are v− = 0, and thus the following relation holds:
where r is the reversibility vector defined in (5.3). The metabolic model is then
rewritten as
where Im is the m × m identity matrix. The matrix S2m and vector v2m are augmented versions of original stoichiometric matrix S and reaction vector v.
As result, the augmented model in (5.5) has n metabolites and 2m reactions.
In the following sections we detail how to construct the proposed graphs over the augmented set of reaction nodes.
Directed Probabilistic Flux Reaction Graphs of metabolism in the absence of context
To construct directed graphs first the auxiliary production and consumption stoichiometric matrices are extracted from S2m:
where abs (S2m) is a matrix whose entries are the absolute values of S2m.
Note that the matrices satisfy S2m = S+2m − S−2m.
The entries s+ ij of the matrix S+2m are the number of molecules of metabolite xi produced by reaction Rj , whereas the entries s− ij of S−2m are the number of molecules of metabolite xi consumed by reaction Rj .
Similarly, as in the bipartite graph bS, the boolean versions of the matrices S+2m and S−2m define two bipartite graphs, a production and a consumption graph, shown in Figure 5.2b for the toy metabolic model.
From these two matrices the adjacency matrix of a directed graph is constructed:
where the tilde denotes the boolean versions of the respective matrices.
The entries dij of D represent the total number of metabolites produced by reaction Ri that are consumed by reaction Rj .
The adjacency matrix of D has size 2m×2m and defines a directed graph on the space of reactions split into their forward and backward components.
In contrast to the undirected graph A, the adjacency matrix of D is not symmetric and thus captures the existence of irreversible reactions in the original metabolic model.
Note that D and A are not directly comparable as they are defined on different sets of nodes.
We can obtain a directed graph on the same set of nodes as A with the m × m adjacency matrix
where C = _ Im Im _T .
The directed graph defined by Adir contains the connections in A, but excludes spurious edges caused by the non-existent backward rates in irreversible reactions.
If the metabolic model contains only reversible reactions, i.e.
, when the reversibility vector r contains only ones, from equations (5.5) and (5.8) Adir = A is recovered.
Although an improvement on A, the graph D still is limited by the fact that uninformative connections created by pool metabolites have the same weight as connections from other more informative metabolites.
To ameliorate their effect on the structure of a metabolic network, a common approach is to remove such highly-connected metabolites from the network description (Ma and Zeng 2003b; Samal and Martin 2011; Silva et al. 2008).
However, removing metabolites can change the network structure drastically, and without a clear definition of which pool metabolites should be pruned, it is typically done with heuristics in an ad-hoc manner that depends on the particular network and scientific question at hand.
As an alternative, instead of advocating the ad hoc removal of pool metabolites, their connections are weighted according to their relative importance (Croes et al. 2006).
More specifically, a probabilistic formulation is proposed to quantify the contribution of metabolite xk to the weight of the connection from reaction Ri to Rj .
The probability that a molecule of xk chosen at random is produced by Ri and consumed by Rj is
P(a molecule of xk is produced by Ri and consumed by Rj) = s+ki
where w+k = P2mh=1 s+kh and w−k = P2mh=1 s−kh are the total number of molecules of xk produced and consumed by all reactions. We collect this information in the vectors
where 12m is an 2m-dimensional vector of ones. For example, if xk is only produced by Ri and only consumed by Rj then s+ki = w+k , s−kj = w−k , and the probability that any molecule of xk flows from Ri to Rj is 1.
We propose constructing a weighted, directed Probabilistic Flux Reaction Graph (PRG) with adjacency matrix
where W†+and W†− are the pseudo-inverses of diag (w+) and diag (w−), respectively.
The weight of the connection from Ri to Rj is
which is the probability that any metabolite molecule chosen at random is produced by Ri and consumed by Rj . The construction of Dp is analogous to D in (5.7),but it has the important difference that the connections have a clear probabilistic interpretation and accurately quantify the importance of the relation between any pair of reactions. It is clear from equations (5.9) and (5.10) that entry-wise 1-norm of the graph’s adjacency matrix is
which is a consequence of the probabilistic formulation of this graph.
The graph encoded in Dp describes the probabilistic producer-consumer relationships of the reactions; the graphs that describe the two other types of relationship between reactions, competition and synergy can be constructed as well:
These two graphs are undirected, their edge-weights describe the probability that
any two reactions consume (Dc) or produce (Ds) a metabolite picked uniformly
at random.
Flux-Balance Graphs
To incorporate the effect of different cellular contexts on the graph of a metabolic network, the edges of the graph are re-weighted using the metabolic flux distributions predicted by Flux Balance Analysis (FBA).
FBA computes a vector of fluxes v_ that optimize a cellular objective, usually maximisation of biomass or growth, assuming that cellular metabolism is in quasi steady state with respect to the remaining cellular processes, i.e., ˙x = Sv = 0.
A key characteristic of FBA is that specific environmental conditions can be included as upper and lower bounds on the reaction fluxes, which in turn act as constraints for the optimisation problem.
These constraints describe, for example, the availability of nutrients, oxygen, and toxins.
The key elements of FBA were summarized in Section 4.5.2.
The weight of an edge between reactions nodes Ri and Rj is defined as the total flow of metabolites produced by Ri that are consumed by Rj .
Under such definition, the entries of the adjacency matrix of the Flux-Balance Graph (FBG), which is called Mv, are
The main assumption behind this definition is that the amount of metabolite produced by one reaction is distributed among the reactions that consume it in proportion to their flux.
For example, if in the FBA solution the total flux of metabolite xk is 10 mmol gDW•h ,
with reaction Ri producing xk at a rate 1.5 mmol gDW•h and reaction Rj consuming xk at a rate 3.0 mmol gDW•h , then the mass flow of xk from Ri to Rj is 1.5 mmol gDW•h × (3.0/10) = 0.45 mmol gDW•h .
The definition in equation (5.13)adds together the mass flows of all metabolites produced by Ri and consumed by Rj , and thus mij represents the total flow between the two reactions.
Self loops describe the metabolic flux of autocatalytic reactions, i.e., those in which some products are also reactants.
The edge weights mij can be computed directly from the stoichiometric matrices S+2m and S−2m defined in equation (5.6), and the FBA solution v_. The augmented flux vector is built
which splits the forward and backward fluxes from the FBA solution in a similar way as v2m in equation (5.5).
If the i-th and j-th entries of v_2m are denoted as v_2mi and v_2mj , respectively, we have that reaction Ri produces the metabolite xk at a rate s+k iv_2mi, while reaction Rj and consumes xk at a rate s− kjv_2mj .
Substituting in equation (5.13) we get
The adjacency matrix of the graph with entries mij is then
where V_ = diag (v_2m), J†v is the pseudo-inverse of diag (Jv), and Jv is the vector of production and consumption fluxes
The equality is a consequence of the steady state condition ˙x = S2mv_2m =S+2m − S−2m
_v_2m = 0.
A fundamental feature of the FBGs Mv is that the connections have a precise physical interpretation. The weights of the connections correspond to the total flow of metabolites between reactions in units of mmol gDW•h .
This feature allows to directly link the connectivity of the graph to the mass flow of metabolites through the network.
Since there is a graph Mv specific to each FBA solution v_, this approach is a versatile framework to produce metabolic reaction graphs for different environmental conditions.
Figure 5.2d shows three metabolic graphs for the example in Figure 5.2a.
In each case FBA solutions are computed under a fixed uptake flux and constrained the remaining fluxes to account for different cellular contexts.
In context 1 the fluxes are constrained to be strictly positive and no larger than the nutrient uptake flux.
In contexts 2 and 3, additional lower bounds are imposed for the fluxes through reactions R6 and R7, respectively.
The results illustrate how changes in the optimal flux distributions translate into different graph connectivities and edge weights.
Context 2 leads to a graph with a similar connectivity to context 1, but with a noticeable redistribution of edge weights in the graph, while context 3 displays an extra connection between reactions R4 and R7, which is absent in the other two cases.