2.1.2 Molecular Biology

In this section a brief overview of molecular biology is provided.

Since biological agents at molecular level (molecules, interactions, pathways, networks) are the main object of study of systems biology, summarizing at least the main features and characteristics of molecular biology is necessary.

Among these are the concept of life and biology as discipline, structure and composition of cells, the central dogma of molecular biology, biological hierarchy, evolution and complexity.

For a much more detailed introduction see Alberts et al. 2014.

Biology and cells

Life is probably the most complex event ever known to occur.

Biology is the science that studies life and living matter in all its forms and phenomena.

It tries to explain the origin of life, growth processes, reproduction events, diversity of organisms, structure of living beings and how the relate with their environment, adapt and evolve.

This vast field is usually divided in many disciplines such as physiology, morphology, cytology, ecology, biochemistry, molecular biology, genetics and many more.

Since the scope of systems biology reaches only the molecular level at this point, I highlight here cell and molecular biology.

The cell is the structural and functional unit of all living organisms.

It can replicate in an independent way and constitutes the irreducible building block of life.

It has the ability to grow, differentiate and reproduce.

Molecular interactions within the cell determine its structure and functions.

These interactions take place through several kinds chemical bonds and forces; ionic, covalent, hydrogen bonds, non polar bonds and van der Waals forces.

The four main classes of molecules bond by those interactions are carbohydrates, lipids, nucleic acids and proteins.

Carbohydrate basic function is energy storage.

The individual blocks of all carbohydrates are monosaccharides.

They are formed by a chain of three to seven carbon atoms.

Glucose is maybe the most important and frequent carbohydrate, being involved in many cellular processes.

It is metabolized during glycolysis into ATP and reducing equivalents such as NADH or NADPH.

On the other hand, lipids are a very heterogeneous group.

They are formed by non polar groups and therefore are highly hydrophobic.

Due to this fact, they form hydrophobic compartments essential for some biochemical reactions which need to occur in absence of water.

Depending of the kind of lipid their functions range from fat and oil storage, membrane constituents and hormones.

Two main nucleic acids exist in cells: deoxyribonucleic acid (DNA) in charger of storing the essential hereditary information (the genes) and ribonucleic acid (RNA) that participate in a much larger number of processes.

Its main biological function however is translate the information contained in the DNA into proteins.

Both nucleic acids are polymers formed by nucleotides, each containing a nitrogen base, a pentose and one or more phosphate groups.

They final class of molecular elements or components in the cell is proteins.

They perform many indispensable functions in the cell.

They are the principal cellular actuators.

They build up the cytoskeletal framework, form the extracellular matrix, participate in signal transduction and above all they function as catalytic enzymes allowing for many chemical reactions to happen in rates that sustain the cell’s life.

Proteins are made up of one or more polypeptides.

Each one consists of a linear chain of amino acids linked by covalent bonds usually called peptide bonds.

Protein structure, which defines its functions, comes from the amino acid distribution along the chain.

Structure is well known for both prokaryotic and eukaryotic cells.

The interior of any cell is surrounded by a semipermeable membrane that separates it from the environment.

Eukaryotic cells are divided in different compartments; mainly two: the nucleus, where the genetic information is stored and the cytoplasm that contains numerous structures called organelles which carry out different tasks.

Endoplasmatic reticulum, mithocondria, Golgi system, transport vesicles and peroxisomes are the most important.

Plant cells contain chloroplasts as well.

Prokaryotic cells are simpler in general: do not contain nucleus, no subcellular compartmentalization, no cytoskeleton and form mostly single cell organisms.

Central dogma and molecular agents

There is a framework for understanding the transfer of genetic information at the core of molecular biology.

In very simple terms it states that in most organisms DNA produces RNA (transcription) and RNA produces proteins (translation).

These proteins are later modified and the carry out most of the cellular functions.

This flow of information is called central dogma of molecular biology (Crick 1970).

Figure 2.3. Basic diagram of the central dogma of molecular biology. Red arrows represent processes occurring in particular organisms.

This framework (Figure 2.3) becomes the center of the very complex process that the cell uses to regulate itself.

Internal and external shifting conditions make the cell to express different proteins to adapt, survive and grow.

A series of interconnected networks of interactions and reactions are established among those agents.

Figure 2.4 shows some typical interactions among those agents.

Genes g1, g2 and g3 codify proteins p1, p2 and p3.

Protein p1 inhibits the transcription of g2 into p2.

Proteins p2 and p3 catalyse the conversion of metabolites m1 into m2 and m2 into m3 respectively.

Metabolite m3 inhibits the catalytic action of p3 in a very common feature usually called feedback inhibition by downstream product.

Metabolite m4 represents an external substance that activates the transcription of g3 into p3.

Regulatory, protein-protein, signalling and metabolic networks although studied usually independently form a whole system in the cell.

In this particular example g1 − p1 − g2 and g3 −m4 form two small regulatory networks and m1 − p2 − m2 − p3 − m3 represents a short metabolic pathway.

Figure 2.4. Interactions among genes, proteins and metabolites.

Biological and system hierarchy

As it is shown in Figure 2.4 particular agents (genes, mRNAs, proteins) in the cell interact with each other to form more complex structures that can achieve particular cellular goals.

This interacting process defines a system and biological hierarchy (Palsson 2015) that starts with particular elements containing the genetic information and ends up with a measurable, defined cellular physiology and behaviour (Figure 2.5).

It goes from the genotype to the phenotype.

Genes produce proteins through transcription and translation; these proteins catalyse metabolic reactions; these reactions link to more reactions forming massive networks that produce a physical shift in the cell.

Ideally systems biology should be able to develop models that precisely represent that hierarchy from genotype to phenotype.

However, sometimes only some levels are represented in a particular model (for instance metabolic constraint-based models that ignore expression data).

In other occasions all levels are represented but the scale is small (for example in some kinetic pathway level metabolic models).

Although some very valuable attempts have been recently made towards a wholecell computational model (Karr et al. 2012; Lee et al. 2008), the predictive and explicative capabilities of these models are still very limited.

Figure 2.5. Biological hierarchy from genotype to phenotype.

Dual causation: physics and evolution

The concept of dual causation (Mayr 1998) represents the double causality that biological systems exist under.

Biological system like any other obey the laws of physics.

These can take the form of thermodynamics, diffusion process, mass conservation, chemical kinetics and many others.

However, these laws are not enough to explain the particular phenotypes that are present in nature.

The second axis of causation originates from the evolutionary process fuelled by natural selection mechanism.

Evolution (Darwin 1859) as we understand it now is an iterative process that elegantly explain the vast amount of diversity (different phenotypes) that occur in nature.

First, a genotype defines a particular phenotype.

Then natural selection adjust the survival and reproduction chances of an individual containing that phenotype.

If mating is successful processes such as mutation and recombination produce a slightly different genotype and the process starts again.

Therefore, only a very small subgroup of possible phenotypes (possible in physical terms) will occur due to this evolutionary process.

Systems biologists often think about this iterative selection mechanism as an optimization process.

Environmental conditions and physics and chemistry constrains bound the allowable space of possible phenotypes and natural selection finds a temporary optimal solution for the problem.

Using typical mathematical terminology genetic variability ensures exploration and natural selection provides the exploitation.

Although this idea underlies many different models in systems biology is in constraint-based modelling of metabolic networks where it is represented in the purest way.

How to mathematically express this optimization function (the search for the fittest individual) is one of the key aspects to keep in mind when working with these kind of models.

2.1.2 Molecular Biology