PRIMES: Protein interaction machines in oncogenic EGF receptor signalling

Final Report Summary - PRIMES (PRIMES: Protein interaction machines in oncogenic EGF receptor signalling)

Executive Summary:
The epidermal growth factor receptor (EGFR/ErbB) signalling network is one of the most important and best studied biochemical networks that regulate growth, survival, proliferation, and differentiation in mammalian cells. However, molecular details how this network processes signals to generate its diverse functions is still enigmatic. PRIMES used an integrated combinatory strategy based on proteomics, structural biology, imaging, computational and mathematical modelling to elucidate the molecular signal processing functions of the EGFR network and its rewiring in pathogenic conditions with a focus on CRC and breast cancer. Our central hypothesis was that protein interactions assemble dynamic molecular machines that receive and process information to coordinate cellular responses to external cues and internal perturbations, such as genetic mutations. Thus, we perceive signal transduction networks as protein interaction networks where stable static interactions serve as organising framework while dynamically changing interactions function as signal processors. This hypothesis generates important predictions which we tested in this project: (i) How do protein interactions contribute to the generation of biological specificity in signalling networks? (ii) How do pathogenetic perturbations affect protein interaction networks? (iii) How can we exploit protein interactions as therapeutic targets? Aberrant EGFR signalling is a major characteristic of several human malignancies including pancreatic, lung, breast, and colorectal cancer (CRC). Ultimately, this study provided insights for the design of efficacious drug targeted therapies in order to overcome the poor response rates and resistance development of the currently used drugs.

PRIMES discovery highlights
• Proof of concept that protein-protein interactions are molecular signal processing machines
• Mapping of a high quality PPIN downstream of the EGFR and its rewiring by KRAS mutation
• Successfully rewired signalling networks through targeted protein engineering
• Development of the Ligand-Based Structure Similarity (LBSSX) platform for structure–activity relationship exploration within commercial databases, and ligand-based virtual screening
• Development of Phage-CONA, a high throughput peptide library screening
• New proximity ligation assay methods that increase sensitivity, efficiency and use on tissues
• New method for cell line authentication using RNAseq data
• New drug targets for CRC identified
• Four compound families and 7 new lead compounds for CRC identified
• Discovery of mode of action for allosteric inhibitors of MAPKs and bromodomain proteins
• New bromodomain protein inhibitors
• Development of new computational biology resources
o HiQuant: fast analysis pipeline for quantitative proteomics data
o DyNet: visualization and analysis of large multi-state dynamic molecular interaction networks
o CHAT: reconstruction of PPINs and identification of contextual nodes
o InsituNet: converts in situ sequencing data into interactive network-based visualisations
o PRIMESDB: highly curated database with various analysis tools for PPINs
o CerebralWeb: network visualization stratified by subcellular localization
o SAPIN: Structural Analysis of Protein Interaction Networks webserver
o VarQ: predicts the effect of sequence variants on protein (mis)folding and activity

Project Context and Objectives:
Project aims and objectives
Our main objective was to understand the role of protein interactions on a functional level that can be taken from basic concepts via sound validation pathways into preclinical research. We believed that this level of understanding could only be achieved by a combination of skills and methods that allowed us to go far beyond the identification of components into elucidating functional connections, disease relevant changes and concrete mechanisms for therapeutic interference. Therefore, PRIMES used a combination of proteomics, imaging, structural biology, computational and mathematical modelling. Rather than trying to map interactions on a global scale we chose the EGFR/ERBB network as a defined and pathogenetically important protein interaction system that we could elucidate to the depth required for a true functional understanding.

Towards these aims the PRIMES objectives were to
1) systematically map static and dynamically rewired protein interactions in the ERBB network with a focus on aberrations of the network associated with CRC and breast cancer
2) unravel the design principles and emergent network properties conferred by protein interactions 3) explore the role of protein interactions in signal processing by rewiring the interaction network using point mutations in interaction interfaces designed based on structural information and computational modelling
4) determine the importance of changes in protein expression at nodes where competition could take place redirecting the signalling through alternative routes
5) validate the findings in genetic mouse models of CRC and in human tissues
6) identify chemical and peptidomimetic compounds that can target protein interactions directly or through allosteric mechanisms, and to test and validate these compounds in 2D and 3D cell culture models with defined genetic lesions
7) select 3-5 compounds for in vivo testing in transgenic and knockout mouse models of CRC. We evaluated the responses against defined genetic backgrounds as an important step towards precision medicine
8) establish an integrated database, PrimesDB, that contains the protein interaction data, structural data, network models, drug screening and validation data. Importantly, this database was developed with the aim of providing a repository as well as an analysis tool for data interpretation. It is publicly available.

The PRIMES concept and workflow
PRIMES used an iterative “map-perturb-predict-validate” strategy to analyse the function of protein interactions in the EGFR/ERBB signalling network in health and CRC. This concept combined cutting edge experimental technologies with theoretical analysis tools to go far beyond the hairball-type interaction mapping which gives a static snapshot of possible interactions. Starting with establishing a static protein-protein interaction network (PPIN) as framework and reference point first, we subsequently analysed the effects of perturbations, such as mutations, drugs, or natural ligands of the EGFR, on the PPIN. In particular, we were interested in deciphering which PPIs are used, why they are used, and what they accomplish in terms of network functions. To functionalise imaging we used advanced techniques such as fluorescence lifetime imaging microscopy (FLIM), which permitted us to assess protein interaction dynamics and protein concentrations in real time. To add molecular mechanistic details we determined the structure of selected interaction partners using expertise and resources of the Oxford Structural Genomics Consortium (OSGC). To add functional interpretation we employed network and pathway analysis approaches and mathematical modelling for reconstructing signalling network topologies encoded by PPIs. These tools revealed emergent network properties and made predictions how the protein interaction network reacts to interference. Interference was applied by using drugs, siRNA, overexpression of node proteins and engineering key node proteins to redirect signal flux. To add (patho)physiological relevance we validated salient findings by assessing the expression and protein interactions of selected proteins in CRC mouse models, organotypic tissue culture models and human tissues taking advantage of the human protein atlas (http://www.proteinatlas.org/) and the proximity ligation assay (PLA) technique. To add a therapeutic perspective, we identified chemical compounds that target protein interactions either by direct disruption, through allosteric effects or indirectly by inhibiting modifications that mediate interactions. These compounds were isolated by screening pharmacophore enriched libraries using high throughput in silico screens followed by targeted in vitro screens, and high content cell based screens. Compounds were tested in cell culture and organotypic colonic crypt models, and selected compounds were further evaluated in CRC mouse models. Results from these preclinical validation studies were feed back to the network mapping informing which parts of the network should be explored in greater detail and with better dynamic resolution. A schematic project overview is presented in Fig. A.

Project Results:
Mapping and reconstructing the EGFR PPIN
The transformation of normal cells into cancer cells and maintenance of oncogenic states and phenotypes are tightly associated with panoply of mutations, epigenetic variations, mutant proteins abundance, altered cellular signaling response and aberrant intra- and inter-cellular interactions. Disruptions of normal biological systems are crucial for cancer transformation to maintain a sustained uncontrolled proliferative growth and a phenotypic oncogenic steady state. Hence, as these genetic alterations translate into alterations in the protein interaction network and on the proteomic and metabolomic levels, a holistic view of cancer development and progression is of great need.

In complex signaling network in cancer development, it is a challenge to anticipate the influence of oncogenic perturbations and to effectively target them. Accordingly, proteins interactions of the EGFR pathway are key to integrate stimuli to direct the information flow into a cellular specific decision-making response. Network-attacking mutations, such as KRAS mutations, hijack the EGFR signaling network by perturbing the proteins interaction landscape effectively creating new cancer-specific attractors. Beyond uncovering changes in proteins interactions, we applied a meta-analysis approach to understand how complex cancer associated deregulations coordinately shape malignant states and phenotypes. Therefore, we used an integrative approach where we investigated KRAS G13D induced discrepancies in protein interactions downstream of the EGFR pathway and uncovered the phenotypic outcome on a proteomic, metabolomic, and lipidomic level.

PPIN mapping by mass spectrometry based interaction proteomics
In order to attain a representative coverage of the EGFR PPIN we used a highly curated map of the EGFR signalling network to choose 96 bait proteins. The baits were expressed as Flag-tagged proteins carefully titrating transfection to achieve a similar and modest level of overexpression in both cell lines. Baits were immunoprecipitated and associated proteins were identified by high resolution orbitrap mass spectrometry (MS). Using a quantitative proteomics approach based on stable isotope labeling by amino acids in cell culture (SILAC) and affinity purification in combination with tandem mass spectrometry, we investigated changes in protein-protein interactions in oncogenic KRAS (G13D) versus the isogenic, non-oncogenic HKE3 cell line in which the KRAS mutation was targeted by a knockout vector. SILAC labelling enabled an accurate quantitation of proteins and interaction changes between the cell lines. Raw data were quantified with Maxquant followed by analysis with HiQuant (Bryan et al., 2016), which automates and accelerates the post-quantification data processing, such as assay normalization and grouping, quality control and statistical analysis. HiQuant also interfaces with common network software platforms such as Gephi and Cytoscape facilitating network visualisation and analysis. Following MS acquisition and data analysis and integration using Maxquant and network building tools, we constructed interaction networks, integrating quantitative changes in protein interactions associated with KRAS-G13D mutation. Importantly, abundance changes that were identified by the whole proteome approach were integrated to compensate for protein interaction changes associated with changes in abundance. Accordingly, using network topology analysis, we further identified key driver nodes within the network. In order to validate interactions changes, we selected several baits proteins and carried western blot analysis to confirm changes detected by the MS analysis.
Molecular, biochemical and biological characterization of cell lines used
The original PRIMES application consisted of the colorectal cancer cell line HCT116 containing the KRAS G13D mutation and its isogenic partner HKE3 in which the G13D mutation had been knocked out. These cell lines were obtained from the Shirasawa laboratory in Japan. Using a construct consisting of a neomycin and thymidine kinase (NT) cassette Shirasawa and colleagues disrupted the KRAS G13D mutation in the HCT116 cell line by homologous recombination [Shirasawa, S., M. Furuse, N. Yokoyama and T. Sasazuki (1993). "Altered growth of human colon cancer cell lines disrupted at activated Ki-ras." Science 260(5104): 85-88]. The HCT116 – HKH2 and especially the HCT116 - HKE3 cell line pairs have been widely used in the literature. As the doubling times of the HKE3 were more similar to HCT116 facilitating the transfection and SILAC labelling protocol, we chose the isogenic HKE3 cell line for comparison to the HCT116 cell line in the PRIMES project. During the project we discovered that the HKE3 cell line contains a residual expression of the KRAS G13D mRNA and protein at a level 3 times lower than HCT116. Therefore, we carried out a detailed biochemical and biological characterisation of the HCT116/HKE3 cell line pair. The original Shirasawa paper also observed that the isogenic cell line, HKE3, lost the ability for anchorage independent growth, which is an excellent indicator of tumorigenicity. Using soft agar assays, we tested the ability of anchorage independent growth. We also tested the cells in a colony formation assay, which indicates that capability of cells to survive and proliferate as single cell clones. Further, we assessed proliferation rates. In all these assays HCT116 cells exhibited a more transformed phenotype than HKE3 cells. Using biochemical assays, we could show that HCT116 cells feature ~3times higher RAS activation than HKE3 cells, and that HKE3 cells preserved a high inducibility of downstream RAS effector pathways. Taken together, this characterisation showed that the isogenic HCT116 – HKE3 cell line pair is a valid system to compare transforming versus non-transforming dosage of KRAS.

Characteristics of the EGFR PPIN
The reconstructed PPINs, termed EGFRNetHCT116 and EGFRNetHKE3, are single-component networks, i.e. all nodes are connected, with >1,200 nodes and connectivities expected for this network size. Specifically, EGFRNetHCT116 consists of 3,163 interactions among 1,309 proteins, with an average of 4.83 interactors per protein. This is a rather low average node degree, but there is a long tail of “hubs” with ≥10 degrees (354 hubs; ~27% of nodes). The best connected hubs (SH2D3C, GRB2, PRKCZ, SH3BGRL, RAB5A, PRKCI and PRKCB) each interact with ≥70 prey proteins. EGFRNetHKE3 features 2,789 interactions between 1,226 proteins with an average node degree of 4.54. EGFRNetHKE3 shares 896 (~73%) nodes and 1533 (~55%) interactions with EGFRNetHCT116, and closely mirrors EGFRNetHCT116 in structure and general topological properties. It has a diameter (the shortest path between the two most distant nodes) of 6 vs 7 for EGFRNetHCT116 (average path length is 4), and a power-law node degree distribution with similar exponents (p(k) = 178.03 k (-1.28) for EGFRNetHKE3 vs. p(k) = 208.93 k (-1.29) for EGFRNetHCT116).

Many of the major hubs from EGFRNetHCT116 maintain their high degrees in EGFRNetHKE3, e.g. RAB5A (75 in EGFRNetHCT116 vs 97 in EGFRNetHKE3), RAF1 (64 vs 66), and SH2D3C (83 vs 76). These hubs also play a major role in linking nodes to each other via the shortest paths as evidenced by their high betweenness centrality (bc) of 0.73 in EGFRNetHCT116 relative to other nodes (average bc ~0.10). While GRB2 is a well-known hub for coordinating different aspects of EGFR signalling, some of the other high bc nodes, such as RAB5, RAF1, and SH2D3C, so far were considered as proteins with rather specialised functions in the activation of endocytosis, ERK and JNK pathway activation, respectively. In addition, several medium-degree proteins (20≤k<50) show high bc (maximum 0.67 average 0.23). Some of these proteins are preys, e.g. AKAP12, CDC37, PRPS1, and SLC25A3. The medium degree but high betweenness indicates that these preys constitute communication links between the different bait-prey complexes. This interpretation is supported by the distribution of clustering coefficients, which reflect how well the interactors of a node are connected between themselves. Preys with low degrees (k<10) and low betweenness (average bc 0.02) tend to display higher clustering (max cc 0.5 average cc 0.16) indicating that they belong to distinct bait-prey complexes in the network. By contrast, medium degree (20≤k<50) preys show lower clustering (average cc 0.024) as they link together different clusters (average bc 0.31).

EGFRNetHKE3 shows similar distribution for node properties, with hubs – e.g. RAB5A, GRB2, and SH2D3C – showing both high degrees (≥60) and betweenness (average bc 0.71) the medium-degree proteins – e.g. MAPK1, CDC37, and SHC1 – showing high betweenness (maximum 0.57 average 0.20) and the low-degree proteins – e.g. CDK1 and MAPK3 -- showing higher clustering (max cc 0.43 average cc 0.15). In summary, these properties indicate that both PPINs are scale-free and small-world networks, where nodes are well connected and signals originating from one node can reach other nodes via a few steps.

The ERBB family PPIN
Increasing evidence suggests that in addition to EGFR (ERBB1) the other family members ERBB2/3/4 play a role in CRC and may provide mechanisms of drug escape As the interactions of integral membrane proteins are difficult to capture using classical co-immunoprecipitation-MS experiments, we applied MaMTH, a mammalian-membrane two-hybrid assay, which detects interactions in situ by reconstitution of a split ubiquitin probe. Using MaMTH we determined the interactome of the four ERBB receptors identifying 218 proteins. Of these 92 are membrane proteins with 52 being integral to the plasma membrane. Of the 376 edges connecting these proteins, 267 are previously undetected interactions. Interestingly, ERBB2 and ERBB3 together cover a majority of interactions, which may be related to the enhanced signalling capacity and biological aggressiveness of ERBB2/3 dimers in various cancers.

Accompanying metabolomic studies
Alongside, we studied metabolomics variability in our cellular model using a targeted metabolomics approach. We quantitatively mapped the metabolites using LC-MS analysis based on the Biocrates metabolic analysis platform where we targeted 6 compounds classes: Amino Acids, Acyl-carnitines, Hexoses, Phospholipids, Sphingolipids and Biogenic Amines. Lipidomic disparities between Oncogenic and non-Oncogenic cellular models were evaluated using MS/MS acquisition in a Data-Independent manner based on SWATH analysis. Moreover, we analyzed phenotypic changes associated with the KRAS mutation using specific KRAS activity assay, colony formation and proliferation assay.

Conclusion
We have established a comprehensive map of the EGFR pathway interactions discrepancies in a KRAS mutation-driven oncogenic cellular model. Using systematic data-driven analysis and integration, we compiled interactomic, proteomic, metabolomics and lipidomic data, together with biological assays to map and understand systematic changes in an oncogenic environment. In depth data analysis allowed us to build a holistic landscape of oncogenic KRAS and identify decisive cancer-associated drivers coordinately shaping and maintaining malignant state and phenotype. Ultimately, our workflow design for unbiased data-driven integration and iterative molecular profiling, provide a system-level understanding of how molecular alterations coordinately drive tumorigenesis. Importantly, our data are integrated in web tool visualization and analysis platform which will be available to the scientific community to provide a better understanding of single mutation driven oncogenesis and to enrich already available data sets that can be used for drug and biomarkers design.

Development of computational tools for network analysis
An integral part of the PRIMES project has been the development of computational tools and the bioinformatics infrastructure necessary to manage, integrate, analyse and store the complex large-scale proteomics data and associated network data generated. This work has led to the development of a number of novel bioinformatics applications that are available not just to the PRIMES consortium but to the wider community (free of charge).

HiQuant: Rapid post-quantification analysis of large-scale MS-generated proteomics data
Recent advances in Mass Spectrometry (MS)-based proteomics are now facilitating ambitious large-scale investigations of the spatial and temporal dynamics of the proteome. However, the increasing size and complexity of these datasets is currently overwhelmingly downstream computational methods, especially those that support the post-quantification analysis pipeline. We found these limitations particularly evident in the PRIMES consortium. To address these limitations, we have developed the high-throughput protein quantification analysis tool (HiQuant). HiQuant implements a customizable post-quantification data analysis pipeline including several data processing, quality control, normalization and statistical analysis steps which can be applied simultaneously to hundreds of assays within a MS-based proteomics experiment. HiQuant also enables the interpretation of results generated from large-scale datasets by supporting interactive heatmap analysis and also the direct export to Cytoscape and Gephi, two leading network analysis platforms. HiQuant may be run via a user-friendly graphical interface and also supports complete one-touch automation via a command-line mode. We evaluate HiQuant’s performance by analyzing a large-scale, complex interactome mapping dataset and demonstrate a 200-fold improvement in the execution time over current methods. HiQuant is publicly available at http://hiquant.primesdb.eu/

Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks
A key goal of the PRIMES project is to integrate PRIMES protein-protein interaction data with publicly available PPI data stored in PRIMESDB to reconstruct the EGFR network and to discover potential nodes in this network for targeting therapeutically. Highly connected network nodes, known as hubs, are attractive targets, as disrupting these nodes will have the maximal impact on the network topology. In this reporting period, we have finalized the development of the Contextual Hub Analysis Tool (CHAT), a Cytoscape App that can reconstruct PPI networks from PSICQUIC-compliant interaction databases (such as PRIMESDB, but not limited to PRIMESDB), and which identifies hub nodes that interact with more "contextual" nodes (e.g. differentially expressed genes or proteins) than statistically expected in networks integrated with user-supplied contextual data (e.g. gene expression data). We term these nodes contextual hubs. In our publication on CHAT (see below), we have shown that such contextual hubs are considerably more relevant than degree-based hubs to the specific experimental context under investigation. As such, these nodes are promising candidates for further functional validation studies and potentially represent important points in the network for drug targeting. CHAT is freely available on the Cytoscape App store at http://apps.cytoscape.org/apps/chat and has already been downloaded over 600 times.

DyNet: visualization and analysis of dynamic molecular interaction networks
The ability to experimentally determine molecular interactions on an almost proteome-wide scale under different conditions is enabling researchers to move from static to dynamic network analysis, uncovering new insights into how interaction networks are physically rewired in response to different stimuli and in disease. The PRIMES project is an excellent example of research that is generating such dynamic PPI networks (i.e. in an oncogenic and non-oncogenic model system). Dynamic interaction data, however, present a special challenge in network biology. To overcome these challenges, we have developed DyNet, a Cytoscape application that provides a range of functionalities for the visualization, real-time synchronization and analysis of large multi-state dynamic molecular interaction networks enabling users to quickly identify and analyze the most ‘rewired’ nodes across many network states. DyNET is freely available on the Cytoscape App store http://apps.cytoscape.org/apps/dynet and has already been downloaded over 1500 times.

InsituNet: network visualisation of spatially aware gene expression data from in situ sequencing
Gene expression studies typically homogenise samples before sequencing, discarding spatial information on where transcripts are expressed. In situ sequencing is a novel method to generate spatially-resolved, in situ RNA localization and expression data. Gene-specific barcodes allow data for up to 40 different transcripts/genes at an almost single-cell resolution to be generated in situ. The resulting images can therefore display the location and intensity of a million or more individual transcripts in a tissue section. Few methods currently exist to analyze and visualize the complex relationships that exist between these transcripts or identify how these transcriptional profiles change in different regions of the tissue or across different tissue sections. Here, we present InsituNet, an innovative new application that converts in situ sequencing data into interactive network-based visualisations, where each transcript is a node in the network and edges represent the spatial co-localization relationships between transcripts. InsituNet identifies co-localizations that occur between transcripts both significantly more, and less, than statistically expected, given the frequency of the transcripts in the tissue. An automated sliding window function allows the generation of networks representing each individual section of the tissue and these networks enable users to quickly and easily identify regions where the transcriptional profiles are altered (e.g. regions associated with pathology). Alternatively, the user can also select (irregularly-shaped) regions of interest in the section for comparison to other regions. One can also compare how the transcriptional network changes across different tissue sections (e.g. healthy vs. disease). Where multiple networks are constructed their layout is spatially synchronised to facilitate comparison. InsituNet has been developed for the popular Cytoscape platform and will be publicly available following publication.

CerebralWeb: a Cytoscape.js plug-in to visualize networks stratified by subcellular localization
CerebralWeb is a light-weight JavaScript plug-in that extends Cytoscape.js to enable fast and interactive visualization of molecular interaction networks stratified based on subcellular localization or other user-supplied annotation. The application is designed to be easily integrated into any website and is configurable to support customized network visualization. CerebralWeb also supports the automatic retrieval of Cerebral-compatible localizations for human, mouse and bovine genes via a web service and enables the automated parsing of Cytoscape compatible XGMML network files. CerebralWeb currently supports embedded network visualization on the PRIMESDB.eu InnateDB (www.innatedb. com) and Allergy and Asthma Portal (allergen.innatedb.com) database and analysis resources.

PRIMESDB: a web-accessible database to explore and analyse PRIMES protein interaction data
PRIMESDB is a robust, web accessible, computational platform to enable the standardized collection, dissemination and analysis of both publicly available and PRIMES Project data, particularly high-throughput PPI data with focus on the EGFR network. An additional goal is to integrate PRIMES static and dynamically rewired PPI networks with external, orthogonal data sources for example: protein structures; protein expression data; mutations in cancer; small molecule interactions; known drug targets; pathway and annotation information etc. This resource is being harnessed to support computational, modelling and experimental analysis to identify key modulators of EGFR network output, assess their potential as therapeutic targets and aid modelling of contextual/dynamic networks in particular across the oncogenic and non-oncogenic PRIMES cell models. PRIMESDB is available at http://primesdb.org/.

Validation of EGFR PPIN in human cells and tissues
The complete gene expression landscape of human cells and, in particular, EGFR network components provides information about biological functions, signaling routes and alterations in response to external stimuli that can increase our knowledge about diseases. Within the PRIMES project we have aimed to characterize the transcriptome and proteome of colorectal cancer cells, and have developed both methods and reagents, in particular antibodies for investigating the expression and interaction of EGFR network nodes in cells and tissues. In brief, the efforts resulted in the characterization of cell lines by RNA-sequencing, development of mass spectrometry- and antibody-based methods for protein analysis and integration of RNA and protein data. The analysis was focused on characterization of colorectal cancer cell lines, widely used for studying, KRAS mutant phenotypes on molecular level, alteration in metabolic and regulatory pathways, and development of treatments.

Characterization of the transcriptome
RNA sequencing (RNA-seq) can be used to interrogate gene expression levels from diverse samples yielding information about the transcriptome and knowledge about what biological functions and gene pathways are affected. Briefly, RNA is extracted from the sample, converted to cDNA and sequenced, yielding data regarding the nucleotide sequence of the reads and their abundance. The data comprising both sequence information and abundance was used to compare gene expression both within and between samples as well as many downstream analyses, such as finding perturbed pathways, biological processes, gene fusions, and many other analyses. We have developed a method to compare RNA sequences and to investigate genetic mutations. The method enables authentication of cell lines, which has been identified to be a major concern impeding interpretation of data and its integration with proteomics data across laboratories. Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values calculated as average of all individual samples for each tissue were used to estimate the gene expression level. We have analyzed the authenticity of COLO205, DLD1, HCT15, HCT116, HKE3, HKH2, HT29 and RKO. In particular, the HCT116, HKE3 and representing a model for studying colorectal cancer associated with KRAS mutations HKH2 were analyzed in greater depth. Interestingly, the EGFR network components are predominantly expressed at medium and moderate levels confirming the ubiquitous expression and function of the corresponding genes.

Characterization of the proteome
The Human Protein Atlas pipeline was used to produce antibodies as tools for analysis of 176 proteins, EGFR network components. Generated antibodies were validated for different applications and used to characterize the protein profiles on an anatomic level. Protein expression profiles obtained with the antibodies produced were compared with experimental or theoretical evidence available through other databases (Uniprot, Ensembl, CCSD, literature). In addition, we have developed a protocol to generate protein quantification standards using the antigen collection produced. Protein standards were obtained by expression of proteins in an auxotrophic E. coli strain in the presence of 14C-Arg and 14C-Lys. Expressed proteins were purified, validated and quantified by mass spectrometry to be used as standards. We have developed a protocol to perform absolute quantification of proteins by Multiple Reaction Monitoring mass spectrometry.

Integration of transcript and protein profiles
Integration of OMICS data obtained by analysis of protein and transcript profiles provides a comprehensive representation of the gene expression. The data obtained from the analysis has been used to create a static network of protein-protein interaction for the EGFR signaling pathway and dynamic changes in protein-protein interaction profiles related to mutations in the KRAS. The results show ubiquitous expression of EGFR network protein components in comparison to other proteins. Protein and transcript expression is made available through the Protein Atlas portal (www.proteinatlas.org).

Imaging of protein-protein interactions (PPIs) and post-translational modifications (PTMs)
In order to monitor PPIs and PTMs of proteins in fixed cells and tissue sections we developed a series of assays based on the in situ proximity ligation assay (PLA) technology. In situ PLA is based on pairs of antibodies bound to DNA oligonucleotides (so-called proximity probes) to target proteins involved in PPIs or different epitopes in PTMs (e.g. one targeting the core protein and one targeting a phosphorylated residue). Proximal binding of such probes enable the creation of a circular DNA molecule formed between the two antibody probes, which can be amplified using rolling circle amplification (RCA). The single-stranded RCA product from a single recognition event will contain several hundreds of repetitive motifs that can be visualized by hybridization of fluorophore-conjugated oligonucleotides. The distance requirement for the formation of the DNA circles is dependent on the size of the affinity reagents and the length and polarity of the oligonucleotides used, ranging from a few nm up to several tens of nm. These assays targeting nodes in signaling network were used to visualize signaling pathway activity in individual cells. The work also included further development of the in situ PLA method to facilitate multiplexed analysis and improve analysis of tissue sections.

Multiplexed analysis of signaling network activity
The ability to visualize signaling network activity at different nodes at the single cell level provides a tool to determine causality in the network, observe if network topology is rewired and to screen for compounds the perturb or rewire signaling flow. We therefore developed a method that is capable of analyzing the start- and end-points of a signaling network, such as receptor phosphorylation and expression of downstream target genes, by combining in situ PLA with padlock probing. Padlock probes are linear DNA molecules that upon hybridization to its target DNA sequence can be ligated into a circular molecule and amplified by RCA. For detection of mRNA we first performed cDNA synthesis in situ by, followed by addition and ligation of padlock probes. After all components for in situ PLA was added, the circular ligation products of both padlock and in situ PLA were amplified and detected together. Using this approach, we could determine end-points of a signaling pathway and monitor the effect of drugs perturbing nodes in the pathway. A variant of in situ PLA was developed to visualize multiple PPIs or PTMs in parallel, which we used to monitor and quantify homo- and heterodimers of the ErbB family members. The method encodes specific tag sequences in the proximity probes that can be utilized to determine the identity of the recorded interactions by using detection oligonucleotides, labeled with unique fluorophores, targeting these tags in the RCA products, i.e. the fluorescence will determine which proximity probes that generated to RCA products.

Improving signal strength and efficiency
During the course of the project we had encountered a few hurdles for using in situ PLA on tissue sections. The problems identified during these analyses included: fluorescent probes binding non-specifically to structures in the tissues, difficulties in identifying weak signals due to autofluorescence of the tissue sections, and poor efficiency of detecting low abundant events. In order to overcome these obstacles, we developed two new variants of the in situ PLA method to provide solutions to these problems. To improve signal strength, integrity and signal-to-noise ratio of PLA, which is important for analysis in tissue sections, a design that compacted the in situ PLA signals was developed. By adding an oligonucleotide, consisting of a tandem repeat reverse complementary to the RCA product during the amplification step, the proximal and distal parts of the RCA product were brought closely together. This reduces the diameter to a fifth, while retaining the fluorescence. In addition, we could show that this also improved the integrity of the signals, by preventing the product to split up into several fragments.

The signal generation of in situ PLA is dependent on how efficient the formation of DNA circles are and how well these are amplified. To increase the efficiency in generating DNA circles we designed an oligonucleotide system in which the circularizing oligonucleotide is integrated in the oligonucleotides of the proximity probes. After binding of the proximity probes these can unfold by enzymatic cleavage to liberate the circularization oligonucleotide, which now can hybridize to the corresponding proximity probe linked to the other antibody. Only one ligation event is required to form a complete DNA circle dramatically improving the efficiency. This new design can be performed in a multiplexed fashion, is compatible with the compaction system described above, and produces bright, well-defined signals that improves analysis of PPIs and PTMs in tissue sections even at low abundance.

Enzyme-independent signal amplification
To improve the efficiency in detection we investigated if other amplification methods could be used instead of RCA and developed a method combining proximity interrogation with hybridization chain reactions as a mean for signal amplification. In this proximity-dependent initiation of hybridization chain reactions we equipped the proximity probes with DNA hairpins. The addition of an activator oligonucleotide that can hybridize to one of the hairpins will liberate a sequence motif that will invade the other hairpin, only if they are in bound close proximity. The now bridged proximity probes will reveal an initiator motif that will prime a hybridization chain reaction of fluorophore-labeled hairpins, creating a long fluorophore-labeled double-stranded DNA molecule. The key feature of this novel method is that it is solely based on hybridization and invasion by oligonucleotides, hence no enzymes are needed to yield a signal. This methodology substantially reduces the cost, and provides advantages in automation of staining and for high through-put analysis.

Assays for monitoring signaling activity
Several in situ PLA assays have been developed to monitor signaling activity in the ErbB network, such as receptor phosphorylation and interaction with Grb2. To monitor downstream events assays targeting phosphorylation of AKT, ERK and STAT3 have been developed. In addition, to these also assays against other pathways e.g. WNT, TGFβ, HIPPO and PTPs were developed. Attempts have been made to combine several of these assays in a multiplexed format and work has been initiated on image analysis and data analysis, to provide a pipeline to provide a systems biology approach for evaluation of cellular communication in tissues.

Probing EGFR signalling dynamics
Signalling by the EGFR converts diverse external stimuli into specific cellular responses, important in embryonic development, tissue homeostasis and wound healing. However, EGFR overexpression and hyper-activation through genetic alterations have been linked to malignant transformation. The EGFR molecule, despite its intrinsic structural safeguards, can still attain an active conformation in the absence of ligand due to thermal fluctuations, necessitating low but continuous protein tyrosine phosphatase (PTP) activity to suppress phosphorylation and activation. Phosphorylation of the conserved regulatory tyrosine Y845 in the activation loop of the EGFR kinase domain leads to an acceleration of its phosphorylation, potentiating EGFR kinase activity in an autocatalytic fashion. Such an autocatalytic activation system that is coupled to PTP activity, by for example a double negative feedback, offers robustness against biological noise and converts external stimuli into threshold-activated responses. This, however, can also lead to amplified self-activation of the receptor in the absence of a cognate ligand, requiring high PTP activity at the plasma membrane (PM) to suppress it. In this scenario, however, the phosphatase activity would also terminate any signalling upon ligand stimulus, thereby rendering the cell unresponsive.

We used quantitative fluorescence microscopy measurements to show that the balance between kinase and phosphatase activity must be spatially regulated to: i) supress spontaneous receptor activation, ii) allow for robust EGFR activation and internalization upon ligand binding, and iii) regulate the duration of the phosphorylation signal. This is possible because the spontaneous and ligand-induced EGFR activation give rise to distinct molecular states that are recognized and processed differently by the endocytic machinery. In particular, a switch in EGFR trafficking occurs upon ligand binding, due to the phosphorylation of pY1045 in the EGFR, where the E3-ligase c-Cbl binds commiting the receptor to unidirectional vesicular trafficking toward lysosomes. In contrast, the spontaneously activated EGFR continuously recycles and is brought back on the plasma membrane. In both cases, the endocytosed receptor is brought in the vicinity of the perinuclear phosphatases PTP1B and TCPTP, which are associated with the cytoplasmic membrane leaflet of the endoplasmic reticulum (ER). These phosphatases are direct interaction partners of EGFR, as determined by catalytically impaired trapping mutants. Since the catalytic efficiency of these phosphatases is high (~2 orders of magnitude higher than EGFR), they dephosphorylate the spontaneously activated receptor that is then recycled back to the membrane, thereby suppressing spontaneous autocatalytic activation. Similarly, the ligand-bound dimeric receptors on their way to the lysosome are dephosphorylated in the perinuclear area in order to produce a finite response to growth factors. We also identified all the classical protein tyrosine phosphatases that affect EGFR phosphorylation and signalling response. This allowed us to make a complete model of the spatial organisation of phosphatases controls EGFR growth factor response.
Mathematical modelling revealed that the ubiquitin-mediated switch in EGFR trafficking is a uniquely suited solution to suppress spontaneous activation while maintaining responsiveness to EGF. In particular, separation of the kinase and phosphatase activity in space allows suppression of spontaneous EGFR activation and regulation of a finite signalling response upon growth factor binding by dephosphorylation and inactivation of EGFR in the perinuclear area. We also demonstrated that the identified mechanisms which couple vesicular membrane dynamics and spatially-distributed RTK-PTP interactions that determine signal duration, safeguarding and system’s responsiveness are universal for different RTKs, i.e. EphA2 receptor.

This work has shown that growth factor signalling is not dictated by the intrinsic features of the growth factor receptor, but depends on the spatially regulated interdependence between phosphatase and intrinsic kinase activity of the receptors. This has implications on how oncogenic signalling could be pharmacologically modulated by affecting the systems dynamic properties on the level of phosphatase activity and receptor trafficking in order to selectively affect cancer cells.

The role of ubiquitination in oncogenic transformation by the EGFR network
Ubiquitination is among the most prevalent post-translational modification of numerous cellular proteins. There are different types of protein ubiquitination: from the addition of one (monoubiquitination) or several ubiquitin moieties to the formation of long ubiquitin chains (polyubiquitination). Additionally, various types of ubiquitin chains can be assembled depending of the amino acid of ubiquitin used for the conjugation (K6, K11, K27, K29, K33, K48, K63 and linear M1 chains). Due to the diversity of ubiquitin modifications, it is now well established that ubiquitination is involved in many pathways including protein degradation, endocytosis, DNA repair, autophagy and NF-κB signalling.

In the PRIMES project, we investigated the impact of oncogenic KRAS mutation on ubiquitination events in the EGFR network and their correlation with EGFR endocytosis. As a first step, we developed molecules that recognise selectively various ubiquitin chains. These ubiquitin sensors were validated for their selectivity to recognise the appropriate ubiquitin chains in vitro and in vivo. Secondly, the sensors specific for linear- and K63-linked ubiquitin chains were used to monitor EGFR ubiquitination in the HKE3 cell line over time. We validated our results using antibodies specific for linear-, K48- and K63-ubiquitin chains. Interestingly, although we did not detect any co-localisation of the linear-ubiquitin chain specific antibody and EGFR we observed an accumulation of the antibody in the close vicinity of the EGFR. We then went on comparing the ubiquitination events in response to EGF treatment between HKE3 and HCT116 cells to determine the impact of oncogenic KRAS. There was no major difference between the two cell lines regarding K48 and K63 ubiquitination, but linear ubiquitination was increased in HCT116 cells even in the absence of EGF treatment. Consistent with this observation, it has been recently shown that the E3 ligase complex LUBAC responsible for the formation of linear ubiquitin chains is highly expressed or hyperactivated in various cancers such as highly metastatic osteosarcoma, diffuse large B cell lymphoma, lung adenocarcinoma and breast cancer. Moreover, the silencing or inhibition of LUBAC decreased the size but also the invasiveness and metastasis of lung cancer and sensitized cisplatin-resistant ovarian and breast cancer cells to cisplatin.

Linear ubiquitination is also involved in the activation of the pro-survival NF-κB pathway. More specifically we showed that the catalytic part of the E3 ligase and the deubiquitinase specific for linear chains (HOIP and OTULIN, respectively) exist as an editing pair and are recruited together to the TNFα receptor where they modulate the NF-κB pathway. We further investigated the role of linear chains and OTULIN in HCT116 cell growth and survival and have evidence that KRAS mutation and linear ubiquitin chains can improve HCT116 cells survival most likely through NF-κB activation.
Because linear ubiquitin chains might constitute a target for the modulation of oncogenic pathways, we also dedicated efforts in developing strategies to target and inhibit signalling triggered by linear ubiquitination. Using the phage display technique we generated ubiquitin variants, which bind to linear ubiquitin binding domains and therefore prevents linear ubiquitin dependant signalling. In vivo, these ubiquitin variants interfere with the activation of the NF-κB pathway and the cellular growth of HCT116 cells.

Altogether, we have made progress in understanding the regulation and implication of ubiquitin linear chains in oncogenic conditions, and also developed potential tools to interfere with linear ubiquitination functions in vivo.

Pathway rewiring through protein engineering
Understanding how cancer mutations affect cellular interaction networks (‘rewiring’) is essential in order to develop treatments that steer the network back to a more physiological / non-cancerous state. In PRIMES we have shown that not only the qualitative effect of a mutation (classical edgetic perturbation model) is important to consider, but that also quantitative parameters (changes in affinities and kinetic constants) and the quantitative context (protein abundance of partner proteins and protein interaction competition of partner proteins) play critical roles. Thus, we have contributed new tools and concepts that go beyond the classical qualitative edgetic network rewiring and include quantitative parameters as described below.

1) The concept of protein interaction competition to steer the signal flux
We tested the hypothesis that differences in the abundance of proteins change signalling outputs because these proteins compete for binding to hub proteins at critical network branch points. Focusing on the ErbB signalling, we created a protein interaction network that included information about protein domains and analysed the role of competing protein interactions. By leveraging three-dimensional protein structures to infer steric interactions among binding partners for a common binding domain or linear motif (node) and including information about protein abundance and interaction affinities, we identified a large number of mutually exclusive (XOR) protein interactions. Computational modelling of changes in protein abundance with different patterns of partner proteins and XOR nodes (XOR motifs) revealed that each motif conferred a different response. In particular, if the abundance of a common upstream protein is lower compared to the sum of all binding partners a competitive effect was seen. We experimentally investigated one such predicted competitive XOR motif, which consisted of the hub protein Ras and its binding partners RIN1 and CRAF. Consistent with the computational prediction, overexpression of RIN1 in cultured cells decreased the phosphorylation of CRAF and its downstream targets. Thus, our analyses provide evidence that variation in the abundance of proteins that compete for binding to XOR nodes could contribute to context-specific signalling plasticity. To be able to distinguish proteins that can bind at the same time from those that use similar binding sites in an automatic way, we have developed the SAPIN (Structural Analysis of Protein Interaction Networks) webserver. This tool enables a full analysis of the protein sequence for the identification of the parts potentially involved in an interaction, a mapping of the available structural data involving the previously identified parts, and the identification of compatible and mutually exclusive interactions at the network level.

2) The role of expression changes of paralog pairs
Focusing on the calcium-induced differentiation of primary human keratinocytes as a model system for a major cellular reorganization process, we analysed the expression of genes whose products are involved in protein complexes. Clustering analyses revealed only moderate co-expression of functionally related proteins during differentiation. However, when we looked at protein complexes, we found that the majority (55%) are composed of non-dynamic and dynamic gene products ('di-chromatic'), 19% are non-dynamic and 26% only dynamic. Considering three-dimensional protein structures to predict steric interactions, we found that proteins encoded by dynamic genes frequently interact with a common non-dynamic protein in a mutually exclusive fashion. This suggests that during differentiation, complex assemblies may also change through variation in the abundance of proteins that compete for binding to common proteins as found in some cases for paralogous proteins. Because of findings related to the anti-correlations of some paralog pairs during keratinocyte differentiation, we next analysed if paralog members (and more general proteins belonging to a similar family) underlie a certain level of expression regulation in tissues. Interestingly, we found that Ras-like GTPases and regulatory proteins are expressed at balanced levels in physiological normal cell types and tissues. This balance is lost in transformed cell lines or in a pathophysiological cancer context and may represent a universal hallmark of cancer. We have been validating these findings using knockout experiments in mouse intestine APC-/- spheres combined with RNA sequencing.

3) Different disease mutations in the same protein can affect the network in a quantitative way and we can predict these changes using the protein design algorithm FoldX
The Ras/MAPK syndromes ('RASopathies') are a class of developmental disorders caused by germline mutations in 15 genes encoding proteins of the Ras/mitogen-activated protein kinase (MAPK) pathway. It is intriguing that mutations in the same 15 genes are also frequently identified in different types of human cancers. We have shed light on 956 RASopathy and cancer missense mutations by combining protein network data with mutational analyses based on 3D structures. Using the protein design algorithm FoldX and mathematical network modelling, we show that quantitative rather than qualitative network differences determine the phenotypic outcome of RASopathy compared to cancer mutations. Furthermore, our quantitative predictions can explain why some cancer mutations (‘drivers’) occur at significantly higher rates than - presumably - functionally alternative mutations. For example, V600E in the BRAF hydrophobic activation segment (AS) pocket accounts for >95% of all kinase mutations. We used experimental and in silico structure-energy statistical analyses, to elucidate why the V600E mutation, but no other mutation at this, or any other positions in BRAF's hydrophobic pocket, is predominant. We found that oncogene mutation frequencies depend on the equilibrium between the destabilization of the hydrophobic pocket, the overall folding energy, the activation of the kinase and the number of bases required to change the corresponding amino acid. Using a random forest classifier, we quantitatively dissected the parameters contributing to BRAF AS cancer frequencies. These findings can be applied to genome-wide association studies and prediction models. We have now developed VarQ, a Web Server that automatically predicts the effect of sequence variants on protein misfolding, activity, its involvement in protein-protein interactions, and/or on drug binding (http://varq.qb.fcen.uba.ar/). VarQ maps each variant to a critical set of representative structural configurations and diagnoses its effect, and calculates simple thresholds to evaluate its relevant structural impacts.

4) Steering signalling flux at the Ras node using designed branch pruning mutations
We used rational in silico structure-based design to engineer Ras mutations that selectively prevent binding to a few downstream effector interaction partners, while keeping the interactions with other effectors unchanged (‘branch pruning mutations’). Interactions were validated in vitro using pull-down experiments (qualitative binding) and microscale thermophoresis experiments (quantitative binding). A computational network model was generated that predicts the change in Ras-effector concentration for Ras mutations. In general, the model predicts that the branch pruning mutations affect complex formations of the originally targeted interactions. However, for two effectors, Nore1 and PI3K, the model predicts that interactions increase through loss of competing interactions. Eight KRAS mutations were selected for controlled expression in HCT116 and HKH2 cell lines using our TEMTAC plasmid tool. Comparing the proliferation of KRAS branch pruning mutations in mouse small intestine APC-/- spheres with those of cell viability in the HKH2 cell line shows a similar behaviour: KRAS E37R selectively binds Nore1A and the lower viability may be explained by the induction of apoptosis. The I36 mutation binds only Raf and thereby may explain an increase in proliferation. We are currently working to tune the levels better with doxycycline. A collaborative manuscript is in preparation.

Learning function from structures
We also extensively used structural information in order to (i) understand the exact molecular mechanisms how signals propagate within the EGFR/ErbB network; (ii) how this information can be utilized for the development of inhibitors that control signalling flux by rationally targeting PPIs as well as suggesting site directed mutants for designed pathway rewiring. The main targets for structural studies were SARAH domains (MST1/2), members of the RASSF tumor suppressor family, domains that recognize specific peptide motives such as acetyl-lysine dependent bromodomains and phosphodependent PTP domains. This work also contributed initial chemical fragment hits for the further development of protein interaction inhibitors as well as potent and selective allosteric inhibitors targeting key enzymes in the EGFR/ErbB network such as ERK1/2.

Allosteric inhibitors and regulation of MAPK
Mitogen activated kinases (MAPKs) are key mediators of signalling flux downstream of EGFR/ErbB. In the framework of PRIMES we have been interested in understanding the plasticity and regulation of MAPKs and how they these properties can be exploited for the development of highly specific inhibitors. In this project we determined the binding mode of the highly specific ERK1/2 inhibitor SCH772984 and found that this inhibitors targets an induced allosteric pocket located between the catalytic domain P-loop and helix alphaC. This novel binding mode might be explored also for other kinases for the development of highly specific inhibitors. In addition, we found that this inhibitor binding mode is associated with slow off-rates, a property that becomes increasingly important during the optimization of drugs. We also discovered that certain metals such as copper increases signalling flux due to activation of the kinase MEK, a property that can be exploited using clinically approved metal chelators for the treatment of cancer.

Targeting the acetyl-lysine binding site of bromodomains
Bromodomains are acetyl-lysine dependent protein interaction domains that play key roles in the recruitment of transcription factors and chromatin modifying enzymes to certain location on chromatin. Some bromodomain containing proteins are key regulator of transcriptional induced by the EGFR/MAPK signalling network. We explored if bromodomains present in the EGFR/MAPK signalling network can be targeted by small molecules and generated a number of crystal structures of these domains in complex with chemical fragments that can be used as starting points for the development of more potent inhibitors. Indeed, based on these structural models, highly potent inhibitors have been developed targeting CBP, ATAD2 and PCAF.

Structural models of SARAH, RAS binding domains and RASSF
A number of protein interaction domains play key roles in the regulation of the EGFR/ErbB network. Among them are RASSF domains, which constitute a family of tumor suppressors of at least 10 members. RASSF proteins are intramolecular inhibited by binding of the C1 domain to the RBD (RAS binding protein), a signaling block that gets released by binding of active RAS to the RBD. This activation mechanisms enables the RBD and SARAH domains of RASSF to form homo-dimers, followed by recruitment of MST kinases which also harbor SARAH domains resulting in kinase activation and thereby stimulation of the hippo signaling pathway. Our structural biology effort on these protein families led to a comprehensive coverage of these protein interaction families with high resolution structures including for instance both SARAH domains of MST kinases and experimental structures on RASSF1 and RASSF3 (Ras-binding domain structure), the RASSF7 (helical domain structure) as well as the structure of SHC1 with and without N-term region. Interestingly, this structure revealed an autoinhibitory interaction blocking the phosphotyrosine binding site. Site directed mutants of these interfaces have been designed based on these structures within PRIMES with the goal to validate structural data and their role in modulating signalling flux within the EGFR/ErbB network.

Validation and application of network derived knowledge in genetically engineered mouse models of CRC
We were confronted with the problem of oncogenic KRAS mediated resistance to treatment in cancer. In order to allow us to perform really effective research into this problem, we generated a number of related cancer models, which differed only in terms of the presence or absence of KRAS mutation and which exactly recapitulate progression of human colorectal cancer. These models in turn allowed us to investigate the direct consequences of KRAS mutation in a really robust way. Our approach to the problem was to use these model systems to generate large-scale gene expression, proteomic and metabolic profiles and compare these data in a completely unbiased way. This then lead to the identification of a number of exciting potential targets which may allow us to develop therapeutic approaches which are effective in KRAS mutant cancers. These studies identified four main vulnerabilities that may be exploited as new targets

1) Rac1 signalling
The small GTPase Rac1 was identified as a potential critical effector of KRAS mutation, and indeed we found that disruption of Rac1 had a dramatic negative impact on proliferation and survival of KRAS mutant cancer cells. Inhibition of Rac1 is able to rescue the Apc loss and mutant KRAS phenotype, and blocks the tumorigenic capacity of hyperproliferative colonic crypts observed in mice with Apc-/- / mtKRAS intestinal epithelia. These data strongly indicate an essential role for Rac1 in intestinal adenoma formation and suggest that it may be a potential druggable target. The Rac1 inhibitor EHT1864 has been shown to be effective against mouse tumour cells in vitro, however, experiments to assess its activity in vivo in our model systems did not yield positive results. Within PRIMES, we have subsequently sought to improve the efficacy of this compound. Results are expected in a few months after the end of PRIMES.

2) TGFβ signalling
In the Apc-/- / mtKRAS model, many tumours developed in the villus compartment and on the top part of colonic crypt, which indicates an increase in cellular dedifferentiation in this model. Analysis of pSMAD3 and p21 (mediators of TGFβ signalling) by immunohistochemistry revealed activation of the TGFβ pathway in the dedifferentiated tumours developed in these mice. Importantly, we were then able to demonstrate that loss of TGFβ type 1 receptor (Tgfbr1fl/fl) significantly accelerated tumorigenesis of the Apc-/- / mtKRAS model. Mechanistically, this inactivation of TGFβ signalling is associated with increased levels of activated pERK, suggesting that in these tumours proliferation is dependent mainly on ERK signalling and suggesting that these tumours might be sensitive to MEK inhibition. Indeed, treatment with Selumetinib, a potent MEK inhibitor, preferentially inhibited tumours lacking Tgfbr1 expression. Thus, MEK inhibitors could be effective treatments for CRC with diminished TGFβ signalling and high levels of pERK activation.

3) Glutamine and glucose metabolism
Gene expression analysis of Apc-/- and Apc-/- / mtKRAS intestine revealed the altered expression of enzymes involved in Glutamine and Glucose metabolism such as GPT1, GPT2, GLS and HK1. Accordingly, we observed that the proliferation of Apc-/- / mtKRAS organoids (which recapitulate the in vivo model) were particularly sensitive to glutamine and glucose withdrawal from the crypt culture medium. Intriguingly, Apc-/- / mtKRAS organoids also exhibited an increase in alanine production and secretion into the medium. This is important because alanine is a non-essential amino acid and it is generated via the catalytic transaminase reaction of glutamate with pyruvate by GPT1/2. We found that Apc-/- / mtKRAS cells show a specific up-regulation of GPT2 at the mRNA and protein level, whereas WT, Apc-/-, and mtKRAS alone express only GPT1. Interestingly, the same switch between GPT1 and GPT2 has been detected in human colorectal cancer. As a result, we generated a mouse model where we deleted Gpt2 in the Apc-/- / mtKRAS background to assess its requirement for KRAS driven intestinal tumorigenesis. While GPT2 deletion alone did not affect survival, a diet lacking alanine was able to extend the overall survival of these mice suggesting that targeting metabolism in KRAS mutant cancer could be a valid therapeutic approach.

Furthermore, we have also identified a novel target, SLC7A5, to be specifically upregulated in Apc-/- / mtKRAS mice and human CRC. SCL7A5 is a solute carrier amino acid transporter, which is required for the export of glutamine leading to the import of leucine. Genetic ablation of SLC7A5 stops the increased proliferation seen in the Apc-/- / mtKRAS mice and dramatically suppresses intestinal tumourigenesis. SCL7A5 can be inhibited by a small molecule, 2-aminobicyclo-(2, 2, 1)-heptane-2-carboxylic acid, which will be interesting to test for effects in CRC.

4) Protein translation machinery
Ingenuity pathway analysis (IPA) of SILAC based mass-spectrometry data comparing Apc-/- to Apc-/- / mtKRAS organoids identified “eIF2 signalling” and “regulation of eIF4 and p70S6K signalling” as the most upregulated pathway following KRAS mutation. eIF4E is one of the key components of the protein translation machinery. It binds the 5’ cap of mRNA helping the ribosome recruitment. We found an increased rate of protein synthesis as indicated by 35S-methionine incorporation following KRAS mutation. In light of our previous data, it is striking that MAPK signalling has also been reported to affect the translation machinery through activation of p38, Mnk1 and Mnk2 and in turn phosphorylation and activation of eIF4E. Expressing a mutant form of eIF4E in the Apc-/- / mtKRAS background, we observed a re-sensitisation to rapamycin treatmen. These data suggest that targeting the regulation of the translation machinery can represent an alternative route to increase the sensitivity to current chemotherapy. Moreover, we are currently testing therapeutic inhibition of this pathway with compounds targeting p38 (PH797804) and Mnk1/2 (Merestinib), either as single agents or combination with rapamycin. Preliminary data suggests that these treatments result in a reduction of phosphorylation of eIF4E and an impact on proliferation. Thus, targeting the machinery which drives protein production is another promising therapeutic avenue.

As a whole, this project has driven multiple strands of research into development of new treatment approaches in cancer, and has identified at least 4 really exciting areas of research that yielded new drug targest, which may prove to be of benefit to CRC patients.

New drugs and new drug targets in the EGFR network
Despite its importance in oncogenic transformation, we currently lack capacities to target this network effectively. Large efforts were dedicated to develop kinase inhibitors either of the EGFR or downstream kinases. Unfortunately, the clinical efficacy is limited and basically abolished if the cancer has a RAS mutation. To address this unmet clinical need PRIMES focussed on the discovery of new targets and new classes of small molecules.

Small molecule annotation of the EGFR network
In order to facilitate the discovery of new drugs and new drug targets in the EGFR network, we catalogued all known small molecule interactors with pathway members. This dataset and associated tools were fundamental in later tasks such as the design of synergistic drug combinations.

Explaining and overcoming drug resistance in the EGFR network based on mathematical models Treatment of cancer patients with ATP-competitive inhibitors of BRAF/CRAF kinases surprisingly increases total kinase activity, especially in wild-type BRAF cells, subverting the desired clinical outcome. Similar inhibition resistance is observed for numerous kinases involving homo/heterodimerization in their activation cycles. We could show that drug resistance resulting from kinase dimerization can be explained using thermodynamic principles. The allosteric regulation by inhibitors is described by thermodynamic factors that quantify inhibitor induced changes in kinase dimerization and the difference in the drug affinity for a free monomer versus a dimer harbouring one drug molecule. The analysis extends to kinase homo- and heterodimers, allows for their symmetric and asymmetric conformations, and predicts how thermodynamic factors influence dose-response dependencies. Importantly, we showed how two inhibitors, ineffective on their own, when combined can abolish drug resistance at lower doses than either inhibitor applied alone. Thus, the mechanistic models suggest ways to overcome resistance to kinase inhibitors.

Synergistic EGFR network drug combinations
Finding drug combinations which produce synergistic therapeutic benefits is a difficult task, which is usually relegated to the realms of expert knowledge or serendipity. Systems biology and modelling approaches offer an attractive alternative to rationally design drug combinations. The CTB-Drug Combinations chemoinformatics platform, developed in the PRIMES project, enables the discovery of highly therapeutic synergistic drug combinations. Using proteomic and network analysis, the platform assembles combinations of two and three drugs, development compounds and experimental compounds which are predicted to maximally perturb the network according to a variety of scoring functions. These scores come from analysis of the differences in protein-protein interaction strengths when comparing the oncogenic HCT116 cell line to the non-oncogenic HKE3 cell line. We generated and assessed priority lists of approved drugs and experimental compound combinations. Evaluation of the combinations using an IncuCyte instrument produced a number of novel (unseen in literature) synergistic drug combinations which are currently undergoing further analysis to determine mode of action. We are also assessing the IP situation around these combinations and aim to patent and license, if possible before moving to publication in 2017.

Phage-CONA: high throughput peptide library screening
Large, randomly generated libraries of linear and cyclic peptides as well as biosimilars can be displayed on the surface of M13 phages, which are then incubated and screened against targets of interest. In PRIMES, we further developed the phage display technique by the introduction of an in-vivo “switching” step. Any primary hits are processed into fluorescent protein (FP) conjugates by cutting out the M13 surface protein on which they were presented for panning. Hit – FP conjugates are then quantitatively tested for their binding affinity to the target protein by our CONA (Confocal Scanning) process. This method drastically reduces the steps needed in the confirmation and validation of phage display hits. Only clones corresponding to peptides with the highest affinity are sequenced to determine their primary structure. By this means, even if they are underrepresented, the most valuable peptide ligands can be rapidly identified for accurate affinity measurements by single molecule spectroscopy, for crystallography and as starting point for library generation and on-bead and solution based screening. The added value of this technique lies in the fact that for complex protein-protein interaction targets in pathway networks specific high affinity tools can be generated that will allow resetting of disease critical nodes.

LBSSX, a platform of integrated molecular similarity tools
The Ligand-Based Structure Similarity 10 (LBSSX) platform is a collection of highly integrated molecular similarity techniques facilitating structure–activity relationship (SAR) exploration within commercial databases, drug repurposing and all other tenets of ligand-based virtual screening. We have implemented a scheme whereby software is vetted for security and then run on a compound collection, producing a set of molecular descriptors for each molecule. These descriptors are then used for ligand based virtual screening without the screener knowing the structure. This allows access to compound archives without compromising IP. Through LBSSX we could associate each compound archive member with its closest known literature similars and thereby linking compounds to newly suggested targets. This approach is valid and working, as shown by its ability to detect known targets. We have recently expanded the approach, mapping Vichem’s compound collection to known actives, to known targets, and now to similar targets.

Assembly and screening of allosteric and protein-protein interaction targeted hit finding library
For PRIMES Vichem assembled and prepared an allosteric and protein-protein interaction targeted library containing more than 2000 compounds around 65 pharmacophores. Most frequent cores are 2-Amino-4-hetero-aryl-pyrimidin, Quinoline, Quinazoline, Pyrido[2,3-d]pyrimidin-7-one, Quinoxaline, Pyrazolo[3,4-d]pyrimidine, Pyrazolo[3,4-d]pyrimidin-4-one, 3-Pyrazyl-formamide, Nicotinonitrile, Oxindole. Partners UEDIN and BICR successfully identified hits from this library against several targets: 1) UEDIN identified 122 primary hits against Survivin which is an important target for colorectal cancer therapies, 2) UEDIN identified 47 and 41 primary hits against EGFR network proteins Rac1 and Rac1b respectively, 3) UEDIN identified 2 primary hits against Ube2c-Uba1 system. Ube2c is an oncogenic ubiquitin-conjugating enzyme, 4) BICR identified 39 primary hits in a comparative screen of differential sensitivity using the HCT116/HKE3 isogenic cell line pair.

Small molecule inhibitor for Survivin
Survivin is a member of the IAP (Inhibitor of Apoptosis) family that is overexpressed in CRC, and according to our studies a promising therapeutic target. The CTB Label Free Affinity Platform (LFAP) has been used to screen a Vichem PPI targeted library against Survivin. A series of primary hits were identified, but they were of too low affinity to justify start of medicinal chemistry. A small molecular binder was, however, identified using the CTB LBSSX platform. Ligand-based molecular similarity techniques prioritised a Vichem compound archive member against Survivin. Upon notification, Vichem identified their compound archive member as in fact being a known Survivin active. Vichem delivered material to UEDIN consisting of the original, known active (which was confirmed as binding in ITC experiments) and a number of derivatives and precursors. A moiety containing an unexplored substitution pattern was found to bind using the LFAP platform. Vichem are sending fluorinated analogues and precursors for follow up NMR studies.

Survivin peptidomimetic inhibition (CTB-MorPH)
Peptides represent the most specific molecular entities as a function of their size. They would represent ideal drug candidates, if peptides would be more proteolytically stable and showed higher permeability. To overcome these problems UEDIN has developed and applied the CTB-Morph software platform to Survivin. Starting with a known binding natural peptide, we have iteratively ‘morphed’ this peptide into a stabilised peptidomimetic with double the affinity of the original peptide and strongly increased proteolytical stability as reflected by an increased half-life in mouse plasma from less than one hour to around 20 hours. Further work is focusing on assessment of cell-permeability and toxicity. A manuscript is currently in preparation and due to be published in 2017.

FGFR kinase inhibitors
Based on the evaluation of data in PRIMESDB and the COSMIC database we identified FGFR kinases as potentially important targets in HCT116 and other CRC models. Benzo[b]tiophene compounds proved to be effective in in-vitro biochemical and cellular assays. Vichem has filed Hungarian and PCT patent application on these compounds.

Synergistic compound combinations
Based on driver gene analysis we selected several compound combinations against HCT116 model cell line Compound combinations contain already launched drugs and patentable structures. More than 50 % of these combinations confirmed synergistic by screening.

Hit to lead optimization
Based on the follow-up synthesis and screening of analogues of primary hits on the different targets four compound families and their targets were selected for further optimization. Seven compounds having the most promising in vitro activities with different scaffolds (nicotinonitrile, imidazo[1,2-a]pyrazine, benzo[b]thiophene, pyrido[2,3-b]pyrazine) selected and some early absorption, distribution, metabolism, and excretion (ADME) parameters were determined either in vitro or in silico. By proper formulation optimized compounds can be further tested in in-vivo cancer models.

Conclusions
The project outcomes fully confirmed the central hypothesis of PRIMES that PPINs serve as signal processing modules that can be fruitfully exploited to provide new drug targets and relevant chemical compounds for the therapy of CRC.

Potential Impact:
Scientific impact

Impact Summary
PRIMES has generated a number of scientific innovations that will have lasting impact on science and the scientific community. The most important achievements in this respect are briefly listed below:

1) Extending the conceptual knowledge base of the role of protein interactions. Our strategy of analysing protein complexes as molecular signal processing machines transcends the traditional perception of the role of protein interactions, and will open new approaches towards a mechanistic understanding of signalling processing by biochemical networks. The comprehensive experimentation required to scrutinize this concept can be realized when focussing on a limited network, such as the EGFR/ERBB network. However, as its pathogenetic relevance to cancer is proven it is an ideal testbed for expanding our conceptual horizon of emergent network properties arising from protein interactions, e.g. competing protein interactions, or dynamically changing interactions.

2) The exploration of new drug targets and chemical space. We have nearly exhausted the chemical space and the drug targets associated with it. PRIMES will investigate protein interactions as targets and screen pharmacophore rich allosteric chemical compound libraries and peptidomimetic libraries against protein interactions. These approaches will open new chemical space and new targets to explore.

3) The translational application to overcome drug resistance in cancer. The clinical responses achieved with signal transduction inhibitor drugs can be remarkable. However, with the possible exception of imatinib (Glivec), resistance usually develops within a few months. Most of these compounds are kinase inhibitors or antibodies against ERBBs. PRIMES proposes to target PPIs as an orthogonal principle that can help avoiding or overcoming resistance to currently used signal transduction inhibitors.

Detailed impacts
Colorectal cancer (CRC) is one of the most common cancers worldwide. A key strategy in the treatment of CRC is the inhibition of Epidermal Growth Factor Receptor (EGFR), and consequently its downstream signalling pathway, using monoclonal antibodies (e.g. Cetuximab and Panitumumab). Unfortunately, activating mutations in the downstream GTPase, KRAS, leads to constitutive activation of the pathway and resistance to these drugs, in 30-50% of CRC patients.

Tackling oncogenic KRAS signalling
To understand and to overcome the inefficacy of current drug treatments and prevent the development of resistance, we have generated a large protein interaction network which is used to understand pathway re-routing in KRAS mutant cell lines. Together with metabolomic and proteomic profiles, these provide valuable resources which will ultimately enable us to understand the re-routing processes and will allow us to interfere. In addition to understanding resistance, dissection of networks of oncogenic and non-oncogenic states will bring us one step closer to understand the transformation of normal cells to cancer cells.

To identify new therapeutic targets downstream of KRAS we require a better understanding of how signals flow from the EGF receptors to downstream transcription factors (TFs) via the complex network that exists between them. In the PRIMES project we have developed a bioinformatics eco-system to facilitate these analyses and have identified significant re-wiring of the EGFR network that occurs in KRAS mutant cells. This work provides a new framework for understanding how networks are re-wired in cancer and will in future lead to the identification of new “choke-points” in these networks for therapeutic intervention.

Analysis of gene expression on molecular level, both transcript and protein, reveals molecular changes specific for disease state. We have investigated cell lines as a model system for analysis of KRAS mutations on cancer. Using RNA-sequencing of an isogenic cell line model system, we show that the interconnectedness of KRAS leads to an overall down-regulation of a signiﬁcant number of genes in the cell. Our data and analyses add to the body of knowledge surrounding KRAS, highlighting the complexity of the gene and its many interactions. The therapeutic challenge provided by KRAS mutation in cancer is incredibly important. In a large number of disease settings, despite KRAS mutation being the key oncogenic drive, as yet very few effective pharmacological therapies exist. Indeed, despite more than 40% of human colorectal cancers exhibiting mutation in KRAS, there are no effective targeted therapies. Therefore, it stands to reason that identifying and targeting processes which are critical to the survival of KRAS mutant cancer cells will provide substantial patient benefit, both in term of survival and tolerability of therapy.

Structure and function guided drug design
The structural models as well as the chemical starting points provided offer an ideal basis for future development of novel medicines targeting these binding sites. The inhibitors that we developed targeting bromodomains are now all available through chemical vendors. We hope that this will lead to a broad uptake of these reagents by the scientific community resulting in the validation of novel drug targets. In addition, we made all coordinates of crystal structures available through the protein structure database (PDB). Also, these data will be used widely by the scientific community to unravel the function of protein-protein interaction domains.

Advances in PPIN mapping reconstruction and analysis
The technological improvements of methods to analyze PPIs and PTMs will provide researchers with better tools to analyze signaling network activity, to understand how the genetic aberrations in cancer leads to altered and rewired signaling. This will in time have impact on diagnostics and selection of therapy for cancers, and may also be useful for the development of new treatments. PRIMES pursued the development of novel methods for analysis of PPINs and PTMs by proteomics but also by imaging in single cells. These aims were spurred by the need of the scientific community to obtain more complex data but we also strive to make them as inexpensive as possible to make them available to researchers in countries with less economic resources. Accompanying these experimental advances we developed a range of computational tools described in the section on scientific progress.

Building a foundation for systems medicine
The concepts and tools developed within PRIMES directly feed into multiscale systems medicine and personalized treatment approaches. The integration of qualitative and quantitative effect of disease mutations are an important step towards the classification and ultimately the stratification of patient mutations. When combined with patient-specific protein abundances obtained from tumour biopsies this will enable a personalized treatment approach in the future. The other important implications are cancer diagnostics and the importance of considering the number of base substitutions needed to change an amino acid for distinguishing cancer driver from passenger mutations.
Impact

Socio-economic impact and the wider societal implications of the project
main dissemination activities and exploitation of results.

Proof of concept for targeting PPINs
PPIs have traditionally been considered as undruggable and maybe not even important. Using EGFR signalling as paradigm PRIMES results have clearly shown that (i) PPIs are important as they serve to organise the cells computing units by which they perceive and interpret signals; (ii) PPIs can be targeted effectively provided that both network context and structural information is adequately considered. PRIMES has generated both experimental and computational tools to accomplish this task. PRIMES also has generated hit compounds and tool compounds against new targets in CRC that will be of great use to the community to explore new context and network based therapeutic strategies.

Nucleating a community of researchers and clinicians interested in personalised medicine approaches
PRIMES comprised both basic, translational and clinical research groups working closely with statisticians, computational and mathematical modellers. This successful experience has promoted the new way of interdisciplinary thinking that is able to drive personalised medicine approaches in the future. These small but promising efforts were amplified by
• various dissemination activities, for instance video interviews with PRIMES researchers, which are posted on You Tube.
• training activities especially for younger researchers and clinicians which included workshops and staff exchanges.

Main dissemination activities and exploitation of results
PRIMES had a variety of measures in place to ensure that our project results and progress were effectively disseminated to all stakeholder groups:

Peer reviewed publications: PRIMES produced over 80 publications including high visibility papers in top journals such as Nature, Nature Cell Biology, Science Signalling and Cell Reports.

Conference contributions: PRIMES researchers have participated in conferences and scientific meetings relevant to the field, both to disseminate information on the progress of the project and to seek the opinions of other experts. The consortium has taken every opportunity to present the findings through oral presentations and poster presentations. Examples of international conferences which members of the PRIMES consortium have attended include International Conference on Systems Biology of Human Disease, Keystone Symposia, Kinases: Next-Generation Insights and Approaches, EMBO Conference “Cellular signalling and cancer therapy” and European Society of Human Genetics.

Press releases and dissemination via public media: The PRIMES award was highlighted as part of a showcase of UCD success in the area of Health in EU FP7 Research in a brochure commissioned by Enterprise Ireland which was posted on both the UCD and Enterprise Ireland website http://www.ucd.ie/t4cms/UCD_FP7_Success_Stories_v3.pdf.

The PRIMES consortium members highlighted the PRIMES project and their research areas in short video clips to disseminate on the website and YouTube for the public, examples of these clips are below:

https://www.youtube.com/watch?v=ueBmEzvoajU – Ola Soderberg Uppsala University

https://www.youtube.com/watch?v=Hj-jbQV79uU – Ivan Dikic Goethe University Frankfurt

https://www.youtube.com/watch?v=6iSqptSJFIw - Veronique Schaeffer Goethe University Frankfurt

https://www.youtube.com/watch?v=QkKWuQRkcxc – Matthias Wilm University College Dublin

https://www.youtube.com/watch?v=PsnyphQiWJ4 – Manfred Auer University of Edinburgh

https://www.youtube.com/watch?v=CjrZWDLXSs8 – David Huels The Beatson Institute for Cancer Research

https://www.youtube.com/watch?v=JY2dMfuf3iI – David Lynn South Australian Health and Medical Research Institute

Social media: PRIMES Twitter (@PRIMESFP7) The PRIMES Twitter social media site was set up to create further awareness of the PRIMES project and to highlight events at which PRIMES are involved in. PRIMES information was also subsequently disseminated through the Systems Biology Ireland Twitter account.

Website: (www.ucd.ie/sbi/primes). The PRIMES project website was set up to provide information relating to the main aspects of the project and to promote its activities. It aims to disseminate the goals, activities and results of the project to target groups and the exchange of project information and data and provides links to the PRIMES videos.

Information Fliers: Fliers were prepared reflecting the project’s scope, approach and objectives, together with useful contact information. The purpose of these fliers was to provide non-electronic material for partners to distribute at meetings, workshops and conferences in order to publicise the project and to make all interested parties and the general public aware of the project.

Logos: The website and all promotional material including fliers, slide decks, poster templates and PRIMES pop-up stand included the specially designed PRIMES logo. The coordinating group UCD moved into a new state of the art facility on the UCD campus in October 2013 and held a building launch in December 2013. PRIMES information was disseminated during the event, in the form of promotional material such as an PRIMES pop-up stand, and the PRIMES logo & web-site address was included on the event brochure. These are some of a number of examples where PRIMES promotional material was displayed.

Internal Dissemination – Newsletter: Internal Newsletters were circulated to the PRIMES partners containing updates on upcoming meetings and training events, recent PRIMES publications and also relevant publications of interest to the PRIMES researchers and any research highlights in the field.

List of Websites:
www.ucd.ie/sbi/primes

Final Report Summary - PRIMES (PRIMES: Protein interaction machines in oncogenic EGF receptor signalling)

Related documents

Share this page Share this page on social networks

Download Download the content of the page