## Mid-Term Report Summary - RULE (Rule-Based Modelling)

RULE focuses on the rule-based language, Kappa, and its simulation environment in which one is able to map and investigate complex signaling pathways in molecular biology. Kappa’s intermediate level of description between the detailed picture of molecular dynamics and traditional chemical reaction networks exploits the modularity of biological agents - by decomposing proteins into their different domains, promoters in their operators, etc. This allows Kappa to obtain concise and executable descriptions of systems that no former techniques could describe. Importantly, modularity in the description is exploited in the stochastic simulation engine to obtain a per-event-cost which is constant in the size of the system (and logarithmic in the number of rules). Model reduction methods can add further efficiency. Kappa has a mature implementation (KaSim due primarily to Krivine and Feret, available at https://github.com/Kappa-Dev/KaSim) which has been used successfully in large and detailed modeling projects as is now being optimised (in the course of the DARPA-funded ExeK project).

The first objective of RULE is to extend the Kappa language (and. accordingly, its theory) to enable the combination of rule-based modeling with other important aspects of intra-cellular phenomenology. We started by addressing the non-homogeneity of molecular networks in cells which in some cases have strong spatial organisations. To deal with these, we have developed ‘Spatial Kappa’ a model and an engine for discrete spatial diffusion. Spatial Kappa allows one to define compartments, their discrete geometries, and interfaces as well as transport rules. Kappa is just a special case. Constraints on the co-diffusion of various components of transmembrane complexes can be captured using a clean construct called a ‘channel’ which can be used to match across multiple boundaries. As it is now, Spatial Kappa only allows static geometries and we are currently trying to find the right formalism to be able to evolve the spatial structure alongside the various entities that diffuse and interact within it. We have realised another parallel extension, ‘geeK’ (for geometrically enhanced Kappa), to target local geometric constraints in the molecular assembly processes. Rules are sometimes too generous and build up complexes that are not realisable in 3d space. GeeK addresses this problem by equipping Kappa agents with geometric attributes which rules can test and modify. Allosteric behaviours that are commonplace in signalling can be captured neatly, and one can constrain the formation of complexes by steric considerations. GeeK’s extension comes at a controlled cost in terms of simulation. The implementation is built on top of a Scala re-implementation of Kappa using light multi-staging: LMS-Kappa (https://github.com/sstucki/lms-kappa).

With geometric and spatial constraints in place, we also address a less tangible type of constraint, namely that of energy. In (mass action) Petri nets, thermodynamic constraints are well understood, and we have shown that whether a system is dissipative is decidable, and that the form of the underlying free energy is fixed. In Kappa, however, thermodynamic consistency (what physicists call being a closed system) is undecidable, as one can show by encoding the Post correspondence problem. Hence, thermodynamic constraints should rather be enforced by construction. In order to handle this, we introduce an energy-based variant of Kappa. The qualitative semantics is unchanged, but the quantitative part (the kinetics) is now derived from specific energy data. These data come as local energy patterns and their count in a given graph determines its global energy. Using earlier theory of rule-based refinements and ‘growth policies’, we show that any rule set can be (finitely) refined into a larger rule set which incurs constant energy differences (and hence can be assigned coherent kinetics), and defines the same underlying transition graph as the original rules. This transformation can be understood as a static compiler of a Metropolis-Hastings simulation. Sole energy stipulations do not fix the kinetics but there is a general linear kinetic model (particular cases of which have been considered in the biochemical literature) which lead to sparse parametrisations of models. Each rule requires the additional specification of a linear transformation on the space of energy patterns. Differently from the preceding extensions, the energy-based one has yet to be implemented. Both the compiler, the specialised simulation engine, and the parametrisation are in the working and will benefit from the extensive refactoring of the KaSim engine which is underway. Sparsity in parametrisation is an important applicative target, but there are also multiple other manners in which energy-based modelling can interact with the rest of the programme. First, the idea of energy can be combined with the geeK development to obtain better parametrisations of geometric models, as well. Second, one needs good learning techniques to effectively find good parametrisations given interaction energies (which are often available) and kinetic data. Perhaps, more generally, novel approaches to parameter fitting, which exploit directly the process of model refinement, can be found. Thirdly, our construction uses energy models that rely only on matching local patterns. Can one use global energy contributions which are commonplace in physical models? Such extensions take us outside of first-order logic notions of graph rewriting, and this increase of descriptive complexity leads into logical questions which tie in well with the second objective of RULE. In fact, energy ideas seep into all aspects of the program: in the more mathematical questions which come next where they can be used to tame forward equations, and the growth models of the third objective where one can use them to define entropy production rates and develop a physical (yet rigorous) view of growing cells as thermodynamic engines.

The second objective of RULE is to refine the Kappa analytic tools, and exploit the algebraic approaches to graph-rewriting to do so axiomatically. The idea is to deal with several variants of graphs (directed, coloured, hyper, with symmetries, etc) at once. Our first contribution here is a (higher-order) formalism for manipulating computation traces in an efficient and conceptually clear way. This approach permits the classification of the various classes of trace compression used in causal analysis. And, as it relies on category-theoretical formulations of graph-rewriting (specifically single push-out), one can `port' the construction to abstract graph-rewriting. Using same category-theoretic tooling, we show that extensions of graph-rewriting by nested application conditions can be handled axiomatically. This can be useful in concrete modelling situations. A second contribution is to address the generation of forward equations for graph observables. Those were first understood in the project as (automated) model reduction. Such transformation generate concise dynamics which help with models of large scale, and can be combined with approximation schemes to lower further the complexity of the dynamics. We show how to generate forward equations, i.e. systems of ODEs to track the mean values of local observables for a restricted grammar of polymers. This case study extends the strictly finite framework of our earlier work. Next, we prove that one can derive forward equations for any Kappa rule set and any local pattern. The well-posedness of the obtained (Kolmogorov forward) equation is no longer guaranteed, however. Indeed, there are known examples where the underlying Markov chain is explosive and one should not expect solutions to always exist (or if they do to reveal interesting things about the process). Finally, and recently, we have obtained results for plain directed graphs which take a pleasing mathematical form, as theorems of `jump closure', showing that the algebra of graph observables is closed under infinitesimal generators associated to local rules. Model reduction is now seen to be just a change of base for the infinitesimal generator of the dynamics. For an axiomatic reprise of jump closure common tools of algebraic graph rewriting (e.g. Lack-Sobocinski adhesive categories and variants) do not seem to work well. Instead, we are investigating another algebraic idea, which parallels the construction of the classical Heisenberg-Weyl algebra and its canonical representation on Fock space. Rewrite rules are then seen as specific irreducible elements of a larger combinatorial Hopf algebra of diagrams. Rules form themselves an algebra, and rewriting is now seen as a (canonical) representation of this algebra on the vector space generated by finite graphs. Double and single push-out variants appear as different evaluation maps from diagrams to rules, and new forms appear as well. Jump closure and the accompanying ODEs become evident. This approach based on diagrams, reminiscent from proof-nets in linear logic, offers a crisp and concise construction of forward equations for all moments of graph observables and might show the way to a good perturbative theory. Other points of active interest are: the search for ergodicity theorems to show that when the dynamics is controlled by energy patterns (as in the energy-based approach above), ODEs for moments of local graph observables always exist; and, the formulation of approximations (mean field, pair approximation, etc) as perturbative theories using deformed versions of the rule algebra, as used in mathematical physics.

The third main theme of RULE is at an earlier stage of gestation and focuses on whole-cell modeling and the development of the MOI modular infrastructure for the assembly of large such models. There are important motivations for building such a tool. One is that for the various work streams of RULE to jointly result in a flexible, expressive, scaleable and foundationally clean framework for building realistic models of complex biological and spatially embedded signalling systems, one needs modules. It will be unrealistic for the decades to come to imagine global models at the rule-based resolution of detail. Also modular approaches permit parallel development and easier reuse. We have designed and implemented a new platform for this type of integrated `whole-cell' modeling: the module integration simulator, MOI (https://github.com/edinburgh-rbm/mois, written in Scala). State updates are time-stamped by simulation time and passed around by processes as needed. A key idea to force modularity, despite the coupling induced by resources in contention, is to use independent mechanisms to allocate such resources prior to a run of the various modules. There are numerous questions to address to refine this first try. A longer term goal is to integrate these modular approaches with vertical models of so-called digital organisms.

Growing evidence points at the impact of global competition for intra-cellular resources (energy, transcription and translation, etc) on the execution of sub-cellular processes. Models in systems and synthetic biology are currently largely oblivious to these global effects. Synthetic gene circuits that implement new functions in a cell should make strategic use of the host environment, and natural ones certainly do. In a recent paper published in PNAS, we have laid down a minimalistic but mechanistic model of growth which recapitulates famous microbial growth laws and for the first time maps them to mechanistic attributes. This is the beginning of a simple macro-economic model of a microbial host. We have also developed a model of DNA repair. Within our modular modelling environment we can plug the latter in the energy environment provided by the former. This is better than coupling the two models by hand - in a way which is less principled, less efficient, and bad software engineering. Coupling repair with growth opens a door into serious questions of evolutionary biology (as repair is mutagenic). In fact, coupling anything with growth, and more generally, building closed loop models where no process is left as an exogenous input, is the future of systems biology. Whole-cell models also open up interesting theoretical questions on the interconnectedness of biological processes. In particular, we wish to adapt the statistical thermodynamics formalism used by physicists in the context of molecular motors to define efficiency of growth models. Efficiency is used by inspecting the use to which the entropy production rate is put. There is a compelling view of cells as growing machines. Being able to put a (most likely context-dependent) notion of efficiency in energy conversion for growth is a fascinating perspective.

The first objective of RULE is to extend the Kappa language (and. accordingly, its theory) to enable the combination of rule-based modeling with other important aspects of intra-cellular phenomenology. We started by addressing the non-homogeneity of molecular networks in cells which in some cases have strong spatial organisations. To deal with these, we have developed ‘Spatial Kappa’ a model and an engine for discrete spatial diffusion. Spatial Kappa allows one to define compartments, their discrete geometries, and interfaces as well as transport rules. Kappa is just a special case. Constraints on the co-diffusion of various components of transmembrane complexes can be captured using a clean construct called a ‘channel’ which can be used to match across multiple boundaries. As it is now, Spatial Kappa only allows static geometries and we are currently trying to find the right formalism to be able to evolve the spatial structure alongside the various entities that diffuse and interact within it. We have realised another parallel extension, ‘geeK’ (for geometrically enhanced Kappa), to target local geometric constraints in the molecular assembly processes. Rules are sometimes too generous and build up complexes that are not realisable in 3d space. GeeK addresses this problem by equipping Kappa agents with geometric attributes which rules can test and modify. Allosteric behaviours that are commonplace in signalling can be captured neatly, and one can constrain the formation of complexes by steric considerations. GeeK’s extension comes at a controlled cost in terms of simulation. The implementation is built on top of a Scala re-implementation of Kappa using light multi-staging: LMS-Kappa (https://github.com/sstucki/lms-kappa).

With geometric and spatial constraints in place, we also address a less tangible type of constraint, namely that of energy. In (mass action) Petri nets, thermodynamic constraints are well understood, and we have shown that whether a system is dissipative is decidable, and that the form of the underlying free energy is fixed. In Kappa, however, thermodynamic consistency (what physicists call being a closed system) is undecidable, as one can show by encoding the Post correspondence problem. Hence, thermodynamic constraints should rather be enforced by construction. In order to handle this, we introduce an energy-based variant of Kappa. The qualitative semantics is unchanged, but the quantitative part (the kinetics) is now derived from specific energy data. These data come as local energy patterns and their count in a given graph determines its global energy. Using earlier theory of rule-based refinements and ‘growth policies’, we show that any rule set can be (finitely) refined into a larger rule set which incurs constant energy differences (and hence can be assigned coherent kinetics), and defines the same underlying transition graph as the original rules. This transformation can be understood as a static compiler of a Metropolis-Hastings simulation. Sole energy stipulations do not fix the kinetics but there is a general linear kinetic model (particular cases of which have been considered in the biochemical literature) which lead to sparse parametrisations of models. Each rule requires the additional specification of a linear transformation on the space of energy patterns. Differently from the preceding extensions, the energy-based one has yet to be implemented. Both the compiler, the specialised simulation engine, and the parametrisation are in the working and will benefit from the extensive refactoring of the KaSim engine which is underway. Sparsity in parametrisation is an important applicative target, but there are also multiple other manners in which energy-based modelling can interact with the rest of the programme. First, the idea of energy can be combined with the geeK development to obtain better parametrisations of geometric models, as well. Second, one needs good learning techniques to effectively find good parametrisations given interaction energies (which are often available) and kinetic data. Perhaps, more generally, novel approaches to parameter fitting, which exploit directly the process of model refinement, can be found. Thirdly, our construction uses energy models that rely only on matching local patterns. Can one use global energy contributions which are commonplace in physical models? Such extensions take us outside of first-order logic notions of graph rewriting, and this increase of descriptive complexity leads into logical questions which tie in well with the second objective of RULE. In fact, energy ideas seep into all aspects of the program: in the more mathematical questions which come next where they can be used to tame forward equations, and the growth models of the third objective where one can use them to define entropy production rates and develop a physical (yet rigorous) view of growing cells as thermodynamic engines.

The second objective of RULE is to refine the Kappa analytic tools, and exploit the algebraic approaches to graph-rewriting to do so axiomatically. The idea is to deal with several variants of graphs (directed, coloured, hyper, with symmetries, etc) at once. Our first contribution here is a (higher-order) formalism for manipulating computation traces in an efficient and conceptually clear way. This approach permits the classification of the various classes of trace compression used in causal analysis. And, as it relies on category-theoretical formulations of graph-rewriting (specifically single push-out), one can `port' the construction to abstract graph-rewriting. Using same category-theoretic tooling, we show that extensions of graph-rewriting by nested application conditions can be handled axiomatically. This can be useful in concrete modelling situations. A second contribution is to address the generation of forward equations for graph observables. Those were first understood in the project as (automated) model reduction. Such transformation generate concise dynamics which help with models of large scale, and can be combined with approximation schemes to lower further the complexity of the dynamics. We show how to generate forward equations, i.e. systems of ODEs to track the mean values of local observables for a restricted grammar of polymers. This case study extends the strictly finite framework of our earlier work. Next, we prove that one can derive forward equations for any Kappa rule set and any local pattern. The well-posedness of the obtained (Kolmogorov forward) equation is no longer guaranteed, however. Indeed, there are known examples where the underlying Markov chain is explosive and one should not expect solutions to always exist (or if they do to reveal interesting things about the process). Finally, and recently, we have obtained results for plain directed graphs which take a pleasing mathematical form, as theorems of `jump closure', showing that the algebra of graph observables is closed under infinitesimal generators associated to local rules. Model reduction is now seen to be just a change of base for the infinitesimal generator of the dynamics. For an axiomatic reprise of jump closure common tools of algebraic graph rewriting (e.g. Lack-Sobocinski adhesive categories and variants) do not seem to work well. Instead, we are investigating another algebraic idea, which parallels the construction of the classical Heisenberg-Weyl algebra and its canonical representation on Fock space. Rewrite rules are then seen as specific irreducible elements of a larger combinatorial Hopf algebra of diagrams. Rules form themselves an algebra, and rewriting is now seen as a (canonical) representation of this algebra on the vector space generated by finite graphs. Double and single push-out variants appear as different evaluation maps from diagrams to rules, and new forms appear as well. Jump closure and the accompanying ODEs become evident. This approach based on diagrams, reminiscent from proof-nets in linear logic, offers a crisp and concise construction of forward equations for all moments of graph observables and might show the way to a good perturbative theory. Other points of active interest are: the search for ergodicity theorems to show that when the dynamics is controlled by energy patterns (as in the energy-based approach above), ODEs for moments of local graph observables always exist; and, the formulation of approximations (mean field, pair approximation, etc) as perturbative theories using deformed versions of the rule algebra, as used in mathematical physics.

The third main theme of RULE is at an earlier stage of gestation and focuses on whole-cell modeling and the development of the MOI modular infrastructure for the assembly of large such models. There are important motivations for building such a tool. One is that for the various work streams of RULE to jointly result in a flexible, expressive, scaleable and foundationally clean framework for building realistic models of complex biological and spatially embedded signalling systems, one needs modules. It will be unrealistic for the decades to come to imagine global models at the rule-based resolution of detail. Also modular approaches permit parallel development and easier reuse. We have designed and implemented a new platform for this type of integrated `whole-cell' modeling: the module integration simulator, MOI (https://github.com/edinburgh-rbm/mois, written in Scala). State updates are time-stamped by simulation time and passed around by processes as needed. A key idea to force modularity, despite the coupling induced by resources in contention, is to use independent mechanisms to allocate such resources prior to a run of the various modules. There are numerous questions to address to refine this first try. A longer term goal is to integrate these modular approaches with vertical models of so-called digital organisms.

Growing evidence points at the impact of global competition for intra-cellular resources (energy, transcription and translation, etc) on the execution of sub-cellular processes. Models in systems and synthetic biology are currently largely oblivious to these global effects. Synthetic gene circuits that implement new functions in a cell should make strategic use of the host environment, and natural ones certainly do. In a recent paper published in PNAS, we have laid down a minimalistic but mechanistic model of growth which recapitulates famous microbial growth laws and for the first time maps them to mechanistic attributes. This is the beginning of a simple macro-economic model of a microbial host. We have also developed a model of DNA repair. Within our modular modelling environment we can plug the latter in the energy environment provided by the former. This is better than coupling the two models by hand - in a way which is less principled, less efficient, and bad software engineering. Coupling repair with growth opens a door into serious questions of evolutionary biology (as repair is mutagenic). In fact, coupling anything with growth, and more generally, building closed loop models where no process is left as an exogenous input, is the future of systems biology. Whole-cell models also open up interesting theoretical questions on the interconnectedness of biological processes. In particular, we wish to adapt the statistical thermodynamics formalism used by physicists in the context of molecular motors to define efficiency of growth models. Efficiency is used by inspecting the use to which the entropy production rate is put. There is a compelling view of cells as growing machines. Being able to put a (most likely context-dependent) notion of efficiency in energy conversion for growth is a fascinating perspective.