Final Report Summary - SYBOSS (Systems Biology of Stem Cells and Reprogramming)
                                Executive Summary:
Stem cells offer great potential for innovations in medicine through the development of patient-specific therapeutical applications, as non-animal models for understanding disease mechanisms and as venues for drug tests. To harness the potential, we need to understand these remarkable cells. Stem cells have the capacity to self-renew and also to differentiate into more restricted cell types. Recent methodological advances in systems biology developed in part by the SyBoSS partners, accessed the characteristics of stem cells with unprecedented precision thereby permitting the construction of highly accurate models. Through the collection of both ‘top-down’ datasets that report total cellular profiles and responses to mutagenesis or environmental perturbations, the application of genome-wide loss-of-function screens for unbiased functional identification and ‘bottom-up’ data collection from selected genes acquired using conditional mutagenesis, protein tagging and ChIP-sequencing. Hence the SyBoSS project gathered systematic data to build a systems biology understanding of selected stem cells. We focused on pluripotent embryonic stem cells (ESCs) and their transition to multipotent epiblast stem cells (EpiSCs) and then on to the tripotential neural stem cells (NSCs). SyBoSS collected data to understand the process of self-renewal in these three stem cell states as well as the transition between these states. Understanding the regulatory framework of any living cell, whether bacterial, single cellular or multi-cellular eukaryote remains a substantial challenge. Stem cells, which can invoke precise programs to shift from one cell state to several others, are even more complex. However understanding the remarkable properties of stem cells is a pre-requisite for fully employing their potential. Using the advantages of the ESC-EpiSC-NSC transition as our model venue, SyBoSS has laid the foundations for the understanding of stem cell potency in unprecedented detail with particular focus on the regulatory networks that secure self-renewal and promote the transition of one state into another.
Project Context and Objectives:
A major challenge in current stem cell biology is to elucidate how gene regulatory circuitry is modified to execute differentiation. Mouse embryonic stem cells provide a tractable system for addressing this problem because they may be stably propagated as homogeneous populations and released into differentiation in defined conditions. To identify genes that regulate transition from the naïve self-renewing ES cell to a differentiation committed state, SyBoSS was based on three broad platforms;
(i) the establishment of reference datasets for the 3 stem cell states; specifically two variations of embryonic stem cells (ESCs, 2i + LIF and serum + LIF), epiblast stem cells (EpiSCs) and neural stem cells (NSCs). The first publication reporting reference datasets has already become a citation classic (Marks et al, Cell 2012). We applied next generation sequencing to examine the transcriptome of ES cells cultured in ground state conditions (known as 2i + LIF) compared with conventional relatively heterogeneous cultures in serum + LIF. We found that ground state ES cells exhibit lower expression of lineage-affiliated genes, reduced prevalence at promoters of the repressive histone modification H3K27me3, and fewer bivalent domains, which are thought to mark genes poised for either up- or downregulation. Nonetheless, serum- and 2i-grown ES cells have similar differentiation potential. Precocious transcription of developmental genes in 2i is restrained by RNA polymerase II promoter-proximal pausing. These findings suggest that transcriptional potentiation and a permissive chromatin context characterize the ground state and that exit from it may not require a metastable intermediate or multilineage priming.
As intended, SyBoSS acquired total transcriptome, small RNA, proteome and phosphoproteome datasets from ESCs, EpiSCs and NSCs. The datasets will be made publically available in 2016 as we finalize their integration into an accessible resource and publication to add value regarding the ESC to EpiSC to NSC transition.
(ii) genome-scale screening approaches to identify functional players in an unbiased manner. The original goal to perform three RNAi genome-wide screens for unbiased discovery of factors involved in various aspects of pluripotency has been successfully exceeded. Five genome-wide and two other medium scale screens have been completed. The five genome-wide screens involved the use of the esiRNA method developed by partner Buchholz to screen ESCs and EpiSCs primed with fluorescent reporters to evaluate (a) the regulation of Oct4 in EpiSCs, which permitted the comparison with Oct4 regulation in ESCs; (b) the negative regulation of the meiotic specific gene, SMC1b; and (c) the roadblock to reprogramming in EpiSCs. Beyond these three screens, (d) a genome-scale siRNA screen in ESCs to identify negative regulators of the exit from pluripotency and (e) a saturation screen using piggy-Bac transpositional mutagenesis in haploid ESCs were also encompassed. In addition, (f) an esiRNA screen against 512 lncRNAs to discover lncRNAs involved in ESC self-renewal and (g) an esiRNA screen against 540 selected chromatin regulators using a reporter for X-chromosomal reactivation in EpiSCs were completed. All these screens have been extremely rewarding and led/are leading to the identification of a variety of factors and processes related to pluripotency. So far four key publications have emerged with at least two more under construction.
(iii) the collection of data from several hundred genes selected for their relevance to stem cell self renewal and transitions, often selected because of the genome-wide screens. The data was collected using tagging, RNAi or conditional mutagenesis methods developed at least in part by SyBoSS partners. Using GFP (or Venus) as the tag, we developed generic methods for standardized imaging, AP-MS and ChIP-seq (when a chromatin protein), which delivered uniformly high and comparable data quality. Hence we obtained a unique composition of datasets with unrivalled relevance for stem cell properties. The AP-MS data have been organized into a user friendly format available at syboss.eu and http://www.digtop.de/syboss_login.php(si apre in una nuova finestra).
In addition to these three cornerstones, SyBoSS also had a concentrated focus on
(a) X-chromosome inactivation, which is linked to the exit from self-renewal. Notably we discovered that the presence of two active X chromosomes delays exit from self-renewal and more recently that the inactive X lacks topologically associated domains (TADs) that characterize autosomes, except around the few genes genes that escape X-inactivation.
(b) technology development. In addition to the technical advances that were incorporated into the project in the first two years, we recently developed the auxin degron, together with rapid CRISPR/Cas9-assisted targeting methods, for functional analyses. The auxin degron brings two advantages over existing methods for ligand inducible loss-of-function. It is reversible and considerably faster. Loss-of-function within 30 minutes promises to open many new insights into regulatory function. Another aspect of technology development involved the optimization of proteomic methods for top-down mapping of ubiquitinylation sites with the aim to quantify ubiquitylation signaling in stem cells.
Value has been added to the data generated and collected by SyBoSS in several ways. The AP-MS and imaging data have been incorporated into a user friendly database that will be made public as soon as we have ironed out the glitches and incorporated the transcriptome data. A database assembling all publically available, ESC relevant ChIP-seq data has been established as the platform into which SyBoSS ChIP-seq data has been incorporated. This represents a substantial resource of genome binding patterns and histone modifications that will be made publically available concomitant to the forthcoming publication. SyBoSS data analysis included (i) the construction of models for the ESC to EpiSC to NSC transition; (ii) the utilization of ChIP-seq data to predict regulatory circuitry; (iii) integration of SyBoSS protein interaction data into existing protein-protein interaction datasets to establish a high confidence set of predicted interactions; (iv) development of standards for co-operative regulatory interactions for systems modelling; (v) the analysis of the dynamics of transcriptional networks driving the differentiation of ESCs; (vi) the evaluation of the functional implications of monoallelic expression, particularly with respect to X-chromosome inactivation, escape from inactivation and Klinefelter syndrome.
 
Figure 1. Current summary of the regulatory circuitry sustaining pluripotency in mouse ESCs.
In summary, our studies have challenged the prevailing model that ES cells enter differentiation via degeneration into a metastable population that experience stochastic lineage priming while co-existing as ES cells. Our data have instead revealed that, in controlled and defined conditions, ES cells undergo a highly orchestrated transition in which the naïve pluripotency network is abruptly dismantled by concerted action of multiple destabilising mechanisms (Figure 1).
Notably extinction of the naïve gene regulatory network occurs prior to detectable expression of definitive lineage specification genes or lineage priming. These observations are consistent with what is known of pluripotency progression in the embryo.
Based on these findings we have proposed that pluripotency may be parsed into three phases; naïve, formative and primed (Figure 2).
 
Figure 2. Schematic of current knowledge showing stages of pluripotency and exit.
Overall, SyBoSS achieved its work commitments in terms of deliverables. Please see the 5th annual report for more details about the final deliverables. Adoption of the mid-term reviewer’s recommendation to exchange quantity for quality made a significant contribution to the value of the outcome. Beyond the operational success, the more notable success has been the scientific accomplishments. A great deal of progress regarding pluripotency, ESCs and the transition through EpiSCs to NSCs, as well as the properties of stem cell self-renewal and differentiation has been made. As well as being amongst the most valuable, ESCs are now arguably the best understood mammalian cell type. The chemistry within the consortium has been excellent and a variety of unanticipated collaborative projects arose because of the complementary interdisciplinarity of the partners. Ongoing projects and relationships have been forged that will ensure that the SyBoSS legacy lasts well beyond the end of the funding period. As a final note, the cost neutral six month extension made a huge difference to the successful outcome and all SyBoSS partners are extremely grateful for this concession, as well as the entire overall support and opportunity conveyed by the funding.
Project Results:
The SyBoSS project benefitted greatly from the cost neutral six month extension and all partners would like to thank the project officer and colleagues for their enlightened management of our project.
Here we summarize the work towards the remaining deliverables.
D1.2-3 135 EpiSC and 135 NSC cell lines. An Excel file listing these cell lines, including a list of the 261 ESC lines expressing tagged proteins and 133 iPSC lines, was delivered. We achieved 137 NSC lines but fell short with only 91 EpiSC lines. However we also generated 133 iPSC lines by Cas9 targeting.
 
D1.1-3 100 AID biallelically tagged ESCs. An Excel file listing 89 ESC lines carrying biallelically targeted genes plus 14 where we only got monoallelic targeting (ie heterozygous) and a further 28 genes that are still underway at the time of writing.
D2.1-2 Western and/or in situ analysis of 600 cell lines. More than 600 cell lines have been imaged and the SyBoSS imaging database is available through the SyBoSS website, syboss.eu. Then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public).
D2.2-2 Identification of protein interactors generated from 300 AP-MS analyses. The outcome of more than 600 AP-MS analyses resulting in 115 successful protein-protein interaction datasets is also available through the SyBoSS website, syboss.eu. Then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public). We have also developed a new Volcano plot tool for user friendly statistical presentation and quantification of the AP-MS data.
D2.3-4 ChIP-seq and target gene analysis is available at
http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public.
This database will be made public when the accompanying manuscript, which is under construction, is accepted for publication. The accompanying manuscript will extend and enhance the of the primary data acquired.
D2.4-3 RNA-seq after esiRNA knock-down of 100 genes from revised D1.1-3 and D2.2-2 and target gene analysis. The list of 117 genes knocked down in both ESCs and EpiSCs has been delivered as an Excel file including notes about whether the knock-down cells displayed a visible phenotype. Because the experiments were performed in both ESCs and EpiSCs, we exceeded the requirement for this deliverable by more than 2 –fold. There is a considerable number of ways in which these data can be analysed. Consequently the target gene analysis is still underway.
D3.4-3 Fine grained model for subset of genes. The report has been submitted.
D3.1-3 NGS database – implementation of SyBoSS-NGS-datasets in webbased, downloadable, genome browser linked database. Is available at
http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public.
D4.2-1 gene lists from three genome-wide RNAi screens & Report on protein-protein interactions. Lists of genes with Z scores above 2 for scoring three genome-wide screens has been delivered as an Excel file. In addition we performed two further genome-wide screens and two medium scale (i.e. ~ 500) screens during the SyBoSS project. Please see the final report (section ‘Screens’) for further description as well as the report on protein-protein interactions (beginning of the section ‘Modelling and computational biology’. This work was also published in Ding et al, Cell Systems 1, 141-151 and would have been included in the deliverable however only one file can be uploaded per deliverable).
Data collection - proteomics
Embryonic stem cells are highly plastic and can be differentiated into more specialized stem cells, which can be further differentiated into specific lineages. Stem cells are complex systems where the identity and functional differences among different stem cell types is determined by the differences in their proteome complement, protein posttranslational modifications (PTMs) and protein-protein interactions. SyBoSS systematically collected proteome and phosphoproteome datasets of the three different stem cell types (ESC, EpiSC, and NSC), as well as identified interaction networks of key stem cell-associated proteins.
We used high-resolution mass spectrometry (MS) in combination with stable isotope labeling by amino acids in cell culture (SILAC)-based quantification for the relative quantification of the proteomes and phosphoproteomes of ESCs, EpiSCs and NSCs (Figure 1A). We quantified over 9,000 proteins in these analyses and over 10,000 phosphorylation sites were quantified in each cell type, providing a deep systems-wide comparison of proteomes and phosphoproteomes of these cell types. While the fraction of proteins down-regulated in EpiSC and NCS compared to ESC were relatively similar, notably, a much larger fraction of the proteome was upregulated in NSCs (Figure 3). Also, the fraction of upregulated proteome and phosphoproteome was comparable for NSC; whereas a larger fraction of phosphoproteome was differentially regulated between ESC and EpiSC compared to their differences in the proteome expression. We are currently finalizing a manuscript including these data, and thereafter will make them available to the community. In addition to the total phosphoproteome analysis, we also investigated the dynamics of phosphorylation sites in ESCs treated with Cisplatin. This research activity was not originally part of the SyBoSS plan but has proved to be a useful complement. The results showed that the phosphoproteome of these stem cells is extensively regulated in response to genotoxic insults.
MEK-ERK signalling stimulates ES cells to transition out of naïve pluripotency and enter the path to lineage commitment. In a second series of phosphoproteomic analysis, we identified targets of the ERK signalling cascade in undifferentiated ES cells utilizing SILAC coupled with phosphopeptide capture. This approach identified RSK1 as a prominent direct target or ERK in ES cells. RSK effects may include negative regulation of ERK activation. We therefore used CRISPR/Cas9 to create combinatorial mutations in RSK genes. Genotypes that included null mutations in RPS6ka1, encoding RSK1, resulted in elevated and sustained ERK phosphorylation. We found that these mutants exhibit altered differentiation kinetics. RSK-depleted ES cells show earlier down-regulation of naïve pluripotency factors,
       
Figure 3. (A) Schematic for the quantification of proteome and phosphoproteome of ESC, EpiSC, and NSC. The ESC, EpiSC, and NSC were cultured in “light” (Arg0, Lys0) or “heavy” (Lys8, Arg10) isotope labeled amino acids media. Cells were lysed and protein extracts were digested with trypsin. A small fraction of the peptides were used for the measurement of the proteome, and the remaining peptides were used for enriching phosphorylated peptides. The samples were analyzed by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) and the raw data were analyzed by the MaxQuant software (B) The bar chart shows the fraction of proteins and phosphorylation sites that were up- or down-regulated in EpiSC and NCS relative to ESC. The data shown are combined from two independent biological replicates.
precocious expression of transitional epiblast markers, and early onset of lineage specification. We further showed that chemical inhibition of RSK increases ERK phosphorylation in ES cells and expedites entry into differentiation. These findings demonstrate that the level of ERK signalling influences the dynamics of ES cell differentiation and highlight the role of signalling feedback in developmental progression. This work is an extension of studies initiated in the FP7 project EuroSyStem. The results are currently being prepared for publication.
To further understand the wiring of protein interaction networks in ESCs, we analyzed protein-protein interaction of selected proteins that are implicated in establishing or maintaining proper stem cell identity. For this work, we used “label free” quantification strategy. We have performed affinity purification mass spectrometry (AP-MS) analysis of 396 samples. These data have been entered into the SyBoSS database. Additionally, we investigated protein expression changes occurring during X chromosome inactivation. We performed proteomic analysis of two different ESCs with one or two active X-chromosomes, XO and XX respectively, using a SILAC-based quantitative proteomic strategy. The ESCs were labeled with light and heavy SILAC media in both forward and reverse combinations. This analysis identified over 4,000 proteins in total, of which a small subset were differentially expressed between XX and XO cells.
Data collection – ESC engineering and transcriptome profiling
In the SyBoSS project, 342 ESC lines encompassing 287 tagged genes were made either by knock-in targeting (110), BAC transgenesis (177) or both (55). These lines were evaluated for tagged protein expression by Western blot, immunofluorescence (both using a goat anti-GFP antibody), tagged GFP/Venus fluorescence and AP-MS. We found that Western was the least reliable of the methods and was therefore discontinued. As recommended by the referee at the mid-term review, instead of attempting AP-MS for 300 genes we should double our efforts on half this number in order to improve the quality and success rate. In total, 216 tagged ESC lines comprising 161 genes were evaluated by AP-MS, most of them twice, totalling well over 600 analyses. Hence SyBoSS work on AP-MS exceeds our revised commitment. Of the 161, 63 were targeted and 114 were BAC transgenes (16 were both). Successful AP-MS results were achieved with 115 genes. At the time of writing, a further 30 genes are being analysed by AP-MS, consisting of 20 repeats and 10 new genes. AP-MS and imaging results are accessible through the SyBoSS website, syboss.eu and then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public). The transcriptome profiles are available at GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=szgfueqinhsnvax&acc=GSE77692(si apre in una nuova finestra)
Transcriptome analysis of RNAi knockdowns in mouse ES and EpiS cells.
To functionally evaluate the roles of ~100 genes identified in the genome-wide RNAi screens, we characterized transcriptional changes in ESCs and EpiSCs following esiRNA knockdown. 117 candidate genes and 1 control gene (Luciferase) were treated with esiRNAs supplied by Partner Buchholz and total RNA was collected from two biological replicates 72 hours post-transfection. As noted in the gene list (D2.4-3) at 72 hours, morphological phenotypes were obvious for 15 genes in ESCs and 29 genes in EpiSCs (4 genes in both). The RNA concentrations were normalized and arrayed in 96-well plates. Bar-coded libraries were prepared from each 96-well plate and subjected to RNA-seq (Hiseq V4, 75 bp pair-end reads). The data is available by searching for ERP013675 at http://www.ebi.ac.uk/ena(si apre in una nuova finestra).
ChIP-sequencing.
The ChIP-sequencing database is available at (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public).
Screens
During the course of the project, SyBoSS committed to three genome-wide loss-of-function screens. Because the screens proved to be extremely fruitful and effective, we have exceeded this commitment considerably. Five genome-wide screens and two medium scale screens have been successfully completed (D4.1-2).
1. Oct4-GFP screens for pluripotency. Oct4 is critically involved in maintaining pluripotency in stem cells, and the changes in its expression level cause differentiation of the cells. Hence, Oct4 expression is a valuable reporter for the status of pluripotent cells. One of the starting points for the SyBoSS project was a genome-wide esiRNA screen using an Oct4-GFP reporter ESC line to identify candidate regulators of Oct4 expression and ESC self-renewal (Ding et al, Cell Stem Cell 4, 403-15, 2009). The high confidence genes were included in the initial SyBoSS gene list for detailed analysis. To gain a systematic understanding of the genes associated with EpiSC identity, we performed a very similar genome-wide esiRNA screen in Oct4-GFP reporter EpiSCs (Ding et al, Cell Systems 1, 141-51, 2015). Our screen uncovered genes that are specifically required to maintain Oct4 expression in EpiSCs together with numerous genes that alter Oct4 expression in both cell types. Surprisingly, beside the identification of shared factors required to maintain Oct4 expression in both cell types, our analyses also revealed numerous knockdowns that lead to increased Oct4 expression solely in EpiSCs. This result indicates that, in contrast to ESCs, Oct4 is under active repressive control in EpiSCs, thereby establishing a fundamental difference in Oct4 regulation in these two pluripotent cell types (Figure 4).
 
Fig. 4 Comparative analysis of the screen results in EpiSCs and in ESCs. The y-axis represents the average Z-scores for the GFP intensity for each targeted gene. Up-regulated (Z-score >2) or down-regulated (Z-score <-2) Oct4 expression is depicted in yellow and blue, respectively. Note the large number of knockdowns that up-regulated Oct4 expression in the EpiSCs screen.
Experiments to analyse the esiRNA screen in EpiSC were completed. A multiparametric integrative analysis of the RNAi screen with protein localization, genetic interaction and protein-level dependency was performed. This analysis predicted that Tox4 exhibits similarities to components of the Paf1 complex. Physical interaction of Tox4 with Ctr9 was confirmed by tagging and proteomic analyses. This analysis also revealed interaction of Tox4 and Ctr9 with components of the PP1 phosphatase complex.
2. siRNA screen of exit from the ES cell state. Using an assay based on the recovery of pluripotency after removal of 2i+LIF, we screened nearly 10,000 genes in duplicate experiments with pools of four independent siRNAs. We validated 28 genes whose knockdown significantly impeded progression from the undifferentiated ES cell state upon transfer from 2i. In addition to members of known critical pathways we found the tumor suppressors Folliculin (Flcn) and Tsc2. Tsc2 lies upstream of mammalian target of rapamycin (mTOR), whereas Flcn acts downstream and in parallel. Flcn, with its interaction partners Fnip1 and Fnip2, drives differentiation by restricting nuclear localization and activity of the bHLH transcription factor Tfe3. Conversely, enforced nuclear Tfe3 enables ES cells to withstand differentiation conditions. Genome-wide location and functional analyses showed that Tfe3 directly integrates into the pluripotency circuitry through transcriptional regulation of Esrrb. These findings identified a cell intrinsic rheostat for destabilizing naive pluripotency and allow transition into differentiation. Congruently, stage-specific subcellular relocalization of Tfe3 suggests that Flcn-Fnip1/2 contributes to developmental progression of the pluripotent epiblast in vivo. These results were published in Betschinger et al, Cell 153, 335-47, 2013.
3. esiRNA screen for reprogramming from EpiSCs to ESCs. To identify genes constituting reprogramming roadblocks we made use of an EpiSC line expressing a chimeric GCSF-LIF receptor (Yang et al Smith A, Cell Stem Cell 7, 319-28, 2010). EpiSCs are resistant towards conversion to naive pluripotency. However, addition of GCSF to this EpiSC line exhibits a low frequency of reprogramming to ground state pluripotency. These cells further contain an Oct4-GFP-IRES-Puro selection cassette that allows, in combination with 2i culture conditions, stringent selection of naive pluripotency. Thus, chimeric-LIF-receptor expressing EpiSC provide a sensitized screening system to identify genetic barriers to reprogramming. Knockdown of STST3 was used as a negative control and knockdown of Zfp281 was used as positive control.
4. SMC1b-GFP screen for repressors of the meiotic gene expression program in ESCs. ESCs not only maintain pluripotency through the Oct4 regulatory circuitry but also through repression of inappropriate gene expression. To unlock this area of regulation, a genome-wide esiRNA screen using the meiotic-specific SMC1b-GFP as the reporter and identifying derepression through increased GFP expression. Notably the transcription factor E2F6 was used as a positive control to validate the assay.
Figure 5. Validation of the SMC1b-GFP assay.
5. Genome-wide haploid ES cell mutagenesis screen of exit from the ES cell state. The development of haploid ES cells provides a powerful new platform for unbiased mutagenesis screens (Leeb et al; Cell Stem Cell, 14, 385-93, 2014). We therefore developed the methodology using piggyback transposition in haploid ESCs with the aim of implementing a saturation screen. After 40 independent screens, the recovery of new genes was very low, indicating approach to saturation. We have identified 310 candidate genes including most of the known players. To achieve rapid in-depth analysis of these candidates we have developed a pipeline for high throughput generation of knock-out ES cells using CRISPR and subsequent transcriptome analysis by RNA-seq. The development and implementation of this new program was enabled by SyBoSS involving the labs of Smith, Beyer and Stewart with the notable inclusion of Martin Leeb, former post-doc with Smith now running his own lab in Vienna. Because the project is a product of SyBoSS, this successful and ongoing collaboration is one of the leading highlights. We anticipate a high profile publication describing the ES cell transition in unprecedented breadth and molecular detail.
6. Three other moderate size screens have been included.
(i) Using the Oct4-GFP reporter in ESCs, knock-downs of 512 lncRNAs using esiRNAs was performed and three lncRNAs were identified as contributing to ESC self-renewal and/or Oct4 regulation. The methodology was published in Chakraborty et al, Nature Methods 9, 360-9, 2012. The detailed analysis of one of the lncRNAs, termed Panct1, is under review at Nature Structural and Molecular Biology. Interestingly this lncRNA is found in the first exon of a protein coding gene with which it co-operates.
(ii) After considerable difficulties, we finally established a female EpiSC line that carried Venus knocked onto the C-terminus of the X-linked gene, G6pdx, on the inactive X. This line was screened with esiRNA selected to knock-down 540 known chromatin regulators by sorting for Venus activation (Figure 6). Validation of the candidates is underway.
 
Figure 6. Summary of the screen for X-reactivation in EpiSCs. The 13 genes whose esiRNA knock down led to activation of the G6pd-X-Venus reporter with a Z score above 2 are labelled.
Technology development
Protein level dependency
We established a way to evaluate of protein-protein interactions using RNAi. For proteins that are subunits of the same complex, then reduction of one subunit often leads to reduction of the other subunits because the complex is destabilized. To test this idea, we used esiRNA transfections in ESCs and EpiSCs selected from our GFP-tagged cell line resources. The esiRNAs were chosen to knock-down a candidate partner of the GFP-tagged protein and GFP fluorescent levels were measured. This was applied to 28 BAC-tagged ESCs and EpiSCs with substantial results. We term this assay ‘protein level dependency’.
Auxin degron
Our work on the auxin degron and its excellent properties for rapid depletion of the target protein was thoroughly described in the 4th annual report. Since then we have developed methods for biallelic knock-in tagging using CRISPR/Cas9-assisted targeting, which permits high throughput targeting in ESCs (manuscript submitted).
Engineering homozygous mutant stem cells
Our attempts to scale up the production of homozygous mutant mouse embryonic stem cells were not successful. Serial targeting of both alleles using conventional gene targeting proved to be too inefficient and the project was abandoned and replaced with (a) esiRNA knockdown experiments in ESCs and EpiSCs The advent of CRISPR-Cas9 technology presented us with an opportunity to develop scalable methods for bi-allelic targeting of genes in stem cells. Over the past year, we developed a robust and efficient method for biallelic targeting of human induced pluripotent stem (iPS) cells. Our strategy for the generation of biallelic knockouts, shown in Figure 7, is to replace a critical exon of one allele of the target gene with a drug selection cassette by homologous recombination and to screen clones for damage the second allele induced by error-prone non-homologous end-joining (NHEJ).
Since homologous recombination is greatly stimulated by the action of site-specific nucleases, we reasoned that the drug-resistant clones would be enriched for cells that take up and express active Cas9 nuclease. Thus, we should expect to see a high incidence of NHEJ-induced damage to the second, non-targeted allele in clones that have undergone homologous recombination. Furthermore, only one copy of the target exon will be present in correctly targeted clones, thus, NHEJ-induced damage to the non-targeted allele can be assessed by Sanger sequencing of PCR products from the target exon. By definition, clones that exhibit a clear mutant read in the target exon will be biallelic events (targeted/NHEJ). Non-targeted clones will carry two copies of the target exon and indels in one or both alleles will not produce a readable sequence trace by Sanger sequencing. Therefore, our strategy provides a simple, scalable genotyping method for the rapid identification of homozygous mutant clones, obviating the need to characterize both alleles by sequencing of cloned PCR products or by single molecule sequencing.
 
Figure 7. Strategy for biallelic targeting of genes with CRISPR-Cas9 programmable nuclease. The diagram shows a short-arm targeting construct with ~1kb homology arms flanking a selectable gene (drug R, usually neo) introduced by nuclease-promoted homologous recombination with concomitant nuclease promoted damage on the other allele.
X-chromosome assay
To follow the process of X inactivation during differentiation and reactivation during reprogramming, we originally aimed to generate female ESC lines carrying two fluorescent reporters on each allele of the G6pdx X-linked gene. Differentiation into EpiSCs and NSCs should inactivate expression from one X, which could be followed by fluorescence. Subsequent cloning would produce a single colour cell line that could be reprogrammed with X-reactivation followed by re-expression of the other colour. As previously reported, we encountered severe technical problems in generating double Venus/Katuschka tagged female ESC lines at the G6pdx locus (in brief, all clones in multiple targeting rounds became XO loosing one X chromosome). As a remedy, we decided (see previous reports) to
(a) establish mouse lines via normal male ESCs carrying one or the other targeted allele. Crossing of these two lines resulted in female mice carrying the two knock-in alleles, from which we established EpiSCs and NSCs. One of the EpiSC lines has recently been used in an esiRNA screen to search for chromatin factors involved in X-reactivation; and
(b) generate female knock-in ESCs using a different parental female ESC line, TX1072 – generated from female hybrid embryos carrying a Tet-inducible Xist gene on one X chromosome (Figure 8; Schulz et al, Cell Stem Cell, 2014). We designed targeting constructs at multiple different loci (Figure 8A,B). The use of CRISPR-mediated approaches greatly accelerated the speed and efficiency of targeting. We have now successfully generated several XX ESC lines carrying GFP and Tomato reporters at the Huwe1 and G6pdx loci. The dual fluorescence in ESCs with two active X chromosomes shifts to mono-fluorescence upon Xist induction as expected (Figure 8C,D). The generation of lines tagged for the Mecp2 and Jarid1c loci are underway. The Huwe1 and G6pdx dual tagged ESC lines are currently being differentiated into EpiSCs and NSCs and will be used for esiRNA screens (in collaboration with F. Buchholz) to identify (a) the factors required for gene silencing and maintenance during X inactivation; (b) the factors required for reprogramming from the EpiSC or NSC state. These screens will be complementary to those performed using EpiSCs with G6pdx Venus/Katushka alleles generated by the Stewart lab from mice. In our case, we can assess XCI in ESCs, EpiSCs and NSCs. Furthermore, the parental ESCs carry highly polymorphic X chromosomes, enabling the precise timing of gene silencing and chromatin changes during X inactivation to be assessed by RNA-seq and ChIP-seq. Pilot data has been generated and bioinformatics analysis is underway.
The results of these studies, initiated in SyBoSS consortium, will provide novel insights into the dynamics of differentiation and reprogramming, as well as uncovering new actors in the processes of gene silencing and reactivation, using the XCI paradigm.
Figure 8. Generation of female ESC lines with GFP and Tomato tagged X-linked alleles
A. The overall strategy is illustrated. Female ESC lines carrying a Tet-inducible Xist gene on one X chromosome, and with GFP and Tomato reporters targeted into the endogenous loci of the G6pdx, Huwe1, Mecp2 and Jarid1c loci, can be used to induce XCI either in undifferentiated ESCs or during differentiation into EpiSCs and NSCs. This leads to non-random inactivation of one allele (either GFP or Tomato) and provides a readout for screens to identify factors that interfere with the process of X inactivation, or reactivation.
B. The targeting constructis used to introduce the fluorescent reporters at the endogenous loci using CRISPR/Cas9 facilitated targeting using short (500bp) homology arms.
C. An example of one Huwe1 GFP/Tomato ESC cell line before and after XIst induction.
D. qRT-PCR assessment of GFP and Tomato expression in several independent G6pdx GFP/Tomato ESC clones before and after Xist induction for 48h.
X-inactivation and small RNA analysis in ESC to EpiSC and NSC differentiation
Male (E14) and female (PGK12.1) ESCs were differentiated into EpiSCs and onto NSCs using standardized culture conditions defined in the consortium. The detailed analysis of the gene expression states during early ES to EpiSC differentiation led to the discovery that the presence of two active X chromosomes delays the exit from pluripotency (Schulz et al, ell Stem Cell 2014). The investigation of the chromatin status of the X chromosome during early ES to EpiSC differentiation also led to the discovery that Jarid2 is an early partner of the inactive X and a key factor for the recruitment of Polycomb repressive complex 2 (PRC2) to the Xi (da Rocha et al, Mol. Cell 2014). Recently, investigating X inactivation changes during EpiSC to NSC differentiation, we discovered that some genes become reactivated on the inactive X. That is, some genes are inactivated when the X is first inactivated and then become reactivated upon further differentiation. This is unexpected and now requires careful work to evaluate whether the ESC-EpiSC-NSC culture model accurately reflects action in the embryo.
In the course of our analyses of ESCs differentiated into NSCs, we noted that the X chromosome undergoes a series of interesting changes in structure and organization using allele-specific Hi-C (in collaboration with the lab of J. Dekker) and RNA-seq. We found that the Xi lacks typical autosomal features such as active/inactive compartments and topologically associating domains (TADs), except around a small number of genes that escape XCI and remain expressed. Escaping genes form TADs and retain DNA accessibility at promoter-proximal and CTCF binding sites, indicating that these loci can avoid Xist-mediated erasure of chromosomal structure. We also found that gene silencing competent Xist RNA is sufficient to induce segregation of the Xi into two ‘mega-domains’ separated by a boundary that includes the DXZ4 macrosatellite sequence, which can also be found on the human X. Deletion of this boundary prior to XCI results in fusion of the megadomains and altered patterns of escape that correlate with changes in TAD structure following differentiation and XCI. These results suggest a critical role for the boundary locus and Xist RNA in shaping the structure of the Xi and modulating escape from XCI. Our findings also point to roles of transcription and CTCF binding in TAD formation in the context of facultative heterochromatin. This SyBoSS publication is under revision at Nature (“Structural organization of the inactive X chromosome” L. Giorgetti, B. R. Lajoie, A.C. Carter, M. Attia, Y. Zhan, J. Xu, C.J. Chen, N. Kaplan, H. Y. Chang, E. Heard# and J. Dekker#). The description of Xi expression and chromatin status during ESC to NSC differentiation will be reported in a separate publication (integrating the ES to EpiSC data mentioned above).
Our investigation of small RNA populations in the ESC-EpiSC-NSC transition led to the discovery of small RNAs at the Xist locus specifically in female EpiSCs and in the Tsix locus in ESCs respectively. The timing of appearance during ESC to EpiSC differerentiation was investigated by small RNA-seq at 4, 7 and 10 days of differentiation, using male (E14) and female (PGK12.1) ESCs. The Xist small RNA population, and in particular one highly represented miRNA, appeared from day 4. To test whether this small Xist-derived RNA entity has any function in XCI or in differentiation, we are currently deleting it using CRISPR/Cas9 both in vivo (in mice) and in female ESCs.
In order to assess the timing of appearance of Xist small RNAs as well as micro RNA changes linked to differentiation into the epiblast lineage, we generated small RNA data from a range of ESC differentiated states including male ESCs, day 4, 7 and 10 differentiated to EpiSCs, and females ESCs (d4,7,10 differentiation) as well as two female EpiSC samples. mRNA profiles have also been generated from the same samples, and in duplicate for each time point. The integration of miRNA and mRNA profiles is a SyBoSS outcome that lead to the discovery of female specific miRNA profiles in early development. Functional evaluation of these differences is underway.
Modelling and computational biology
The functions of this Work Package within SyBoSS were (1) to generate community standards that facilitate the integrated computational modelling of cellular systems (especially mouse ES cells), (2) to develop computational methods for the analysis of data generated in this project, (3) provide input for second-tier experimental work, such as suggesting follow-up target genes, and (4) to generate new biological insight through computational modelling of ES cell dynamics.
This WP has achieved all of its goals, deliverables and milestones within time (see also the previous periodic reports). Thus, this WP has been very successful, which is underlined by the multiple interactions among WP3-partners and between the computational and experimental groups in the SyBoSS consortium. New collaborations have led to unexpected research directions, such as the functional genomics screen conducted under the lead of Austin Smith for the identification of genes involved in the exit from pluripotency. The work of Partner 11 (Univ. of Cologne, Beyer) has established the computational framework for the analysis of this data, which was not envisioned in the initial project proposal. Further, this WP has created community standards for reporting complex, combinatorial interactions between proteins in public databases (see below), which will have a long-lasting impact on the scientific community beyond the lifetime of SyBoSS.
Notably we integrated the loss-of-function scores, genetic interaction mapping, protein localization and protein-level dependency (PLD) into one model to delineate connectivity between factors that control Oct4 expression in EpiSCs and then compared the role of these factors to their function in ESCs (Figure 9). These data define shared and distinguishing factors in naïve- and primed- pluripotent cells, and provided insights into the dynamics that accompany the transitions between ESCs and EpiSCs. We demonstrated the power of this integrative approach by the prediction of Tox4 as an interacting partner of Paf1C (Figure 9).
 
Figure 9. Multiparametric integration of Omics data predicts Tox4 as Paf1 interacting partner. Graphical presentation of hierarchical cluster analysis of indicated Omics data. A binary distance metric was used for the localization data and an Euclidean distance metric was employed for all other data sets. Components of known protein complexes are highlighted with the same color.
Protein-chromatin interactions
The goal of this WP was to compile and analyse large-scale protein-chromatin interaction data in order to generate input data for subsequent modelling, especially in WP3.4 (see below). The core of this WP is the assembly of ChIP-sequencing data characterizing the binding patterns of transcription factors (TFs) to the genome, which subsequently was used to predict target genes regulated by these TFs.
Because of the heterogeneity of the published data, we decided to re-analyse all raw data. The reason for getting involved in such an intensive task was that if we wish to use the pre-existing, published data sets as benchmark and background data, we need to have these datasets processed and analysed using exactly the same tools and strategies. These re-analysed data are then integrated in a graphical and interactive manner, both in terms of data visualization and for further data analysis. For this purpose, we have implemented a SyBoSS-specific version of the UCSC Genome Browser coupled to a dedicated Galaxy server.
We have integrated 66 unique ChIP-Seq tracks, comprising 11 histone modifications, 1 acetyl-transferase, 1 lysine-transferase, 24 transcription factors, 1 co-factor, 7 ploycomb and trithorax, 3 members of the RNA-Pol II pausing complex, 3 members of the super elongation complex, 4 member of the cohesion complex, 2 members of the mediator complex, 4 RNA polymerase II WT and mutant proteins, 2 chromatin organization proteins, 1 DNA methylase with the addition of RNA, MRE and MeDIP-Seq tracks. All of these tracks have been analysed using identical pipelines and the resulting data are uploaded into our SyBoSS’ dedicated database (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public).
A very important subsequent step was the prediction of TF target genes from the ChIP-seq data. Inferring target genes from chromatin binding data of TFs is anything but trivial. The key problem is that TFs can bind far from their target genes and it is a priory unclear how to score the ‘binding pattern’ of a given TF around a gene. We have developed a scoring system that integrates the distances of binding sites from potential target genes in a target score. This score accounts for multiple binding events in the proximity of the same gene and it accounts for TF-specific features, such as how far from promoters a given TF typically binds. This scoring was compared to various other popular scorings using independent information, such as target gene function or expression data from TF knock-out experiments. This method comparison revealed that naïve target calling that simply checks for the presence or absence of TF binding within a pre-defined window around the promoter performs poorly compared to more sophisticated scorings that also account for the distance and number of binding events. This work was published in 2013 (Sikora-Wohlfeld et al. PLoS Comp. Biol. 9(11): e1003342, 2013).
Protein-protein interactions
Protein-protein interaction data is essential for the data integration for the elaboration of SyBoSS modelling. However, the number of protein-protein interactions that have been genuinely measured in mouse cells is extremely low (< 500 interactions) compared to other species (yeast and human have known interactions in the range of tens of thousands). However, even in the case of human we know that the existing experimentally validated interactions only cover a small fraction of the whole human interactome, which is predicted to have more than 50,000 interactions. Thus, there is a clear need for the computational prediction of interactions in general, but particularly in mouse.
Due to the evolutionary proximity we have decided to infer a physical protein interactome for mouse using respective resources developed for human proteins. In particular we (Beyer group, TUD) have developed machine learning methods for predicting high-quality protein interactions in human. The new human interactome contains more than 100,000 high confidence interactions, the majority of which are newly predicted. Several hundred predictions have been experimentally tested within the SyBoSS consortium (Tony Hyman) and using external partners (Matthias Mann, MPI Martinsried, Germany, Ulrich Stelzl, MPI Berlin, Germany). Such extensive experimental validation of a database predicting protein interactions is unprecedented in the published literature. The human network has been published in an international peer reviewed journal (Elefsinioti et al. Molec. Cell. Proteomics 10(11):M111.010629 2011) followed by a second paper presenting a new computational approach that we developed for this purpose (Sarac et al. 2012 Bioinformatics 28(16):2137, 2012).
The next steps in this WP has been to develop methods for the transfer of this network to mouse utilizing advanced orthology determination algorithms. The mouse network is thus composed if (1) interactions directly measured in mouse/mouse cells, (2) interactions measured in human cell lines and (3) interactions based on computational prediction. Further, we used an improved method for utilizing protein domain information for the prediction of interactions. The new scoring that we used separates known protein-protein interactions from the rest very well (Figure 10). The new integrated mouse network contains 9892 novel physical interactions with high confidence.
 
Domain Interaction Score
Figure 10. Density distribution of domain interaction scores for 15,659,097 possible mouse PPIs with a domain interaction score greater than zero. Known physical interactions tend to have higher domain interaction scores, which confirms the predictive power of this score.
Cooperativity
We developed standards for cooperative regulatory interactions. The project was forward looking in the sense that the main benefit of these standards will be felt after the end of the grant period, as researchers learn to systematically collect the appropriate data. It has become clear that Cooperativity is essential for biological complexity (Gibson, Cell regulation: determined to signal discrete cooperation TiBS 34, 471-82, 2009). Both multivalency and allostery enable multiple state inputs to determine a single execution step. Regulatory protein complexes will not be correctly modeled without taking account of cooperative interactions. We had the simple, clear but vital objective to introduce cooperativity into molecular systems resources for the benefit of the SyBoSS consortium stem cell research and more broadly for systems modelers everywhere.
Proper cell physiology depends on numerous molecular interactions and analyzing these interactions is a prerequisite for understanding cell function and regulation. Several publicly available molecular interaction databases exist that provide experimentally validated and manually curated molecular interaction data to the scientific community, and as such make an important contribution to scientific research. However, the interactions are all treated as binary and independent whereas, within cells, molecular interactions are generally not independent but cooperative, i.e. they influence each other positively or negatively, an aspect that was insufficiently and inconsistently captured in bioinformatics resources but is critical for reliable and robust cell signalling. Within the SyBoSS project, we tackled this shortcoming by setting out to integrate cooperative interactions in bioinformatics resources. We also provided the first publicly available resource having cooperative interaction data available for analysis and in a computer-readable standard data format. The open standards that we have introduced provide reference platforms that will enable bioinformaticians to further develop computational resources for helping to advance knowledge on the molecular details of cooperative binding and understanding of cell regulation in general.
In our first task, we developed a standard format for cooperative interaction data (D 3.3-1). The first version of this standard is an extension of the current data format for molecular interaction data, the PSI-MI2.5 XML format, and uses new controlled vocabulary (CV) terms, which were added to the PSI-MI CV ontology, to describe cooperativity between distinct binding events. We also developed a website that describes in detail how to annotate cooperative interaction data using the PSI-MI2.5 XML format and provides several examples of different complexity (http://psi-mi-cooperativeinteractions.embl.de(si apre in una nuova finestra)). The use of the PSI-MI2.5 XML to capture cooperative interaction data has also been published (Van Roey et al., 2013; PMID: 24067240). Standards development is a dynamic process, and the molecular interactions group of the PSI consortium continued with the development of a major revision of the molecular interaction data exchange format, PSI-MI3.0. We were actively involved in this development, specifically in adding and deeply embedding elements that improve annotation of cooperative interactions, making it more inherent to the format. To this end, we continued our collaboration with the group of Henning Hermjakob at the EBI, the lead developers of the PSI-MI standard. A pre-final draft version of the new PSI-MI3.0 format was composed at the 2014 PSI spring meeting (April 13-16, 2014, near Frankfurt, Germany) and since then has been out for review.
After developing the standard, it was important that we showed that it could be applied in a bioinformatics database. Therefore we next developed the switches.ELM resource (http://switches.elm.eu.org(si apre in una nuova finestra)) a database for experimentally validated cooperative interactions curated from the literature (D 3.3-2) (Van Roey et al., 2013; PMID: 23550212). Most of the data currently curated in switches.ELM involves interactions mediated by short linear motifs (SLiMs), low-affinity interaction modules that are frequently used cooperatively to function as molecular switches. This bias stems from SLiMs being our area of expertise; however, any set of molecular binding events affecting each other can be captured, and in a continuous curation effort we are also annotating these more general cooperative interactions in switches.ELM. In addition, we kept working on improving visualization and data export for the new PSI-MI3.0 format once it is released. We were then able to apply the lessons learned from switches.ELM by working with the IntAct Consortium, to revamp the existing binary interaction database IntAct and make it cooperativity aware (Orchard et al., 2013; PMID: 24234451). As part of the collaboration, we organised several meetings involving the three bioinformatics groups of SyBoSS and our external collaborators, including a successful and valuable Hackathon. A final task to achieve our milestone (M 3.3) and get to a cell signalling and regulation systems-ready cooperativity tool set is defining a guideline to unambiguously report cooperative interaction data (D 3.3-4). For this purpose, we have defined the Minimum Information About Cooperative Interactions (MIACIs) (D 3.3-3) which can be found on our cooperative interactions website (http://psi-mi-cooperativeinteractions.embl.de/CIS-MIACIs.html(si apre in una nuova finestra)).
We were able to complete our original set of deliverables within the expected time. Work continued on improving the new bioinformatics resources but, at the SyBoSS midterm meeting, we also added a new deliverable (D 3.3-4) for a project with Edith Heard, Partner 6. This relates to her group’s findings that numerous developmental genes show random monoallelic expression (RME). One curiosity is that not all paralogous members of multigene families are monoallelic. From a gene list provided by SyBoSS partner 6, we have constructed phylogenetic trees using family member protein sequences. The goal is to assess if there is a correlation between the conservation/speed of evolution of proteins that belong to the same protein family and the severity of phenotypic effects upon inactivation of these proteins. By building phylogenetic trees, the speed with which proteins diverge can be estimated from the branch lengths in the tree. As genes are duplicated, the resulting paralogues that constitute the protein family will diverge and possibly acquire distinct functionality over time. If we expect the rate of mutation to be the same for all members of the family (null hypothesis), a different rate of evolution/level of conservation would suggest some evolutionary pressure on a subset of family members that are functionally more important, in which case their inactivation would have a more profound effect on cell function. We constructed phylogenetic trees for several multi-protein families and compared branch lengths with known phenotypic defects that result from protein inactivation (taken from the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/)(si apre in una nuova finestra)). Our preliminary results on this limited number of protein families show that there indeed appears to be a correlation between conservation and functional importance. For instance, for the EYA (Eyes absent homologs) protein family, inactivation of the most conserved members EYA1 and EYA4 (short branch length) results in severe phenotypic defects and in many cases is not viable. The two other members EYA2 and EYA3 seem to be functionally less important as inactivation only results in mild phenotypic effects, and as can be seen from the phylogenetic tree, these proteins are not as conserved (longer branch lengths) (Figure 11). EYA1 and EYA4 are monoallelic during development whereas the phenotypically milder EYA3 is biallelic. Similar results were obtained for other families, notably the SIX protein family.
At time of writing the SyBoSS final report, Partner 4 has co-authored nine publications that cite SyBoSS funding support. Based on the citations of these papers, we can already say that the SyBoSS-funded work is having a significant impact in the research community. For example, the MIntAct paper has been cited 143 times and our review on the attributes of short linear motifs 156 times (source: Google Scholar).
 
Figure 11. Phylogenetic tree (vertebrate protein sequences) of the EYA protein family. Shorter and longer branch lengths indicate slower and faster evolution, respectively. The EYA1 (purple) and EYA4 (red) proteins have a slower rate of evolution/divergence (short branches) and their inactivation is associated with severe phenotypic defects. In contrast, the EYA2 (orange) and EYA3 (green) proteins evolve faster (long branches) and their disruption only results in mild phenotypic effects. Phenotypes were taken from the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/(si apre in una nuova finestra)).
Pathway and state change models
We analysed the dynamics of transcriptional networks driving the differentiation of ES cells. This was achieved by integrating public and SyBoSS-specific ChIP-seq datasets (available at (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public) with SyBoSS RNA-seq data, which characterizes the transcriptional activity of >50% of all genes in the mouse genome.
1. TF centered dynamic analysis of the gene regulatory networks involved in ESC->EpiSC->NSC cell state transitions.
We first analysed mRNA-Seq data from SyBoSS experiments differentiating ESC to EpiSC and NSC (relative to ESC). We then integrated these gene expression profiles with TF-Promoter information obtained from public databases and from our re-analysis of Stem Cell specific publications. This generated a dynamic representation of the gene regulatory networks involved in differentiation. The RNA-seq data covered ESCs in 2i and serum, EpiSCs, and neural stem cells at multiple time points. For each time point, we identified the genes significantly changing relative to “ESC in 2i” and relative to the previous point. For each list, we divided the genes into up and down regulated and performed a functional enrichment analysis (GO analysis). TF-promoter association data were obtained and analysed as described above. The final TF-target network contains 557276 interactions covering 352 transcription factors of which 22 were the results of our own ChIP-Seq analysis. Target gene calling was performed using the Sikora-Wohlfeld et al. method (see above).
In order to integrate the transcriptomics and DNA-binding data in a way that would allow us to obtain a dynamic representation of the regulatory network involved in ESC to EpiSC to NSC differentiation, we used the Dynamic Regulatory Events Miner (DREM) tool, which has been created exactly for this purpose (Schultz et al, BMC Systems Biology 2012). DREM utilized both RNA-Seq data and TF-promoter information to infer gene expression thresholds and predict which TFs can explain such boundaries. In short, for each time point, DREM will separate genes into significantly changing genes and use those to look for enrichment of specific TFs in their promoter regions. Using gene expression levels of each TF, DREM will also separate regulators into “activators” (blue) and “repressors” (red). Unassignable TFs will be marked in black. Figure 12 shows the results of such analysis in which we only looked at 2 partitions, “Minimum Absolute Log Ratio Expression” of 1.2 Train test random seed of 1260. Additionally, DREM will perform GO Functional enrichment analysis on each branch.
 
Figure 12. DREM analysis of ESC-EpiSC-NSC RNA-seq data.
2. miRNA-centered dynamic analysis of the gene regulatory networks involved in ESC->EpiSC differentiation in E14 (male) and PGK (female) cells.
Similar to the previous mRNA/TF integration, we have studied the regulatory roles of miRNAs in the differentiation of ESCs to EpiSC in both male and female cells. In this project, we obtained RNA-Seq and small RNA-Seq data from differentiating ESC at days 0, 4, 10, 20 and 30 for both PGK (female) and E14 (male) cells. RNA-Seq data were analysed using the same pipelines and methods as for the previous project. miRNA expression levels were determined using the ncPRO server (https://ncpro.curie.fr/(si apre in una nuova finestra)) and differential gene expression was investigated using the same pipelines as for RNA-Seq.
miRNA and mRNA expression data were integrated with TF-Promoter (as previously described) and miRNA/mRNA interaction predictions retrieved and processed from TargetScan 6.2 (http://www.targetscan.org/mmu_61/(si apre in una nuova finestra)). The integration was performed using a miRNA specific variant of DREM, called mirDREM. This variant will use TF-Promoter, miRNA/mRNA predictions and overall expression pattern to “explain” overall gene expression. Importantly, mirDREM uses only those miRNAs showing anti-correlation with their target mRNAs at any time point to predict a regulatory role. These analyses were performed on both PGK and E14 cells (Figure 13). While TF regulation appears to be somehow consistent between sexes, miRNAs identities vary more. Figures 13 and 14 show PGK and E14 miRDREM outputs respectively. Only the top 5 most significant miRNAs and transcription factors are shown. Red – underexpressed miRNAs. Blue – overexpressed miRNAs.
 
Figure 13. miRDREM analysis of PGK (female) ESC-EpiSC-NSC data.
 
Figure 14. miRDREM analysis of E14 (male) ESC-EpiSC-NSC data.
Finally, we generated regulatory networks for both the “static” analysis (each time point as an independent observation using DESeq2) and the mirDREM “dynamic” model. Intersection of these results in an optimized network of the regulatory roles of miRNAs in ESC differentiation (Figure 15).
 
Figure 15. Integrated network. Green nodes - mIRNA; blue nodes – genes; blue/pink edges – male or female cells.
Models of protein complex and pathway-specific impact of X inactivation and X reactivation
X chromosome inactivation (XCI) and X chromosome reactivation (XCR) are essential processes in females already from embryonic development in order to compensate for the extra gene dosage and obtain monoallelic expression. Brunak/DTU has studied the effects of XCI and XCR using men with Klinefelter syndrome (KS) as model. KS is the most frequent chromosome disorder affecting approximately 1-2:1000 newborn boys with the most common karyotype being XXY (Hysolli et al, Cell Cycle 11, 229–235, 2012). Men with KS present a palette of co-morbidities, overrepresented disease occurrence compared to background population. Previous studies have been published focusing on KS co-morbidities (Bojesen et al, Acta Paediatr. 100, 807–813, 2011; Lahlou et al, Acta Paediatr. 100, 824–829, 2011), delayed speech and learning difficulties, psychosocial problems, testicular insufficiency, taller than predicted by the parent’s heights and eunuchoid body proportions. Yet, the penetrance of the phenotypes is highly variable with some not showing any signs of the syndrome while others are highly affected. The clinical phenotypes of KS are believed to arise from the genes escaping XCI.
The aim of the study by Brunak was to study the molecular mechanisms underlying the clinical phenotypes of KS. KS comorbidities were first extracted from 2.6 million Danish electronic patient records (Jensen et al Nat Commun 5, 4022, 2014). Some of the most significant co-morbidities of KS from this analysis were testicular dysfunction, hypopituitarism, neuromuscular scoliosis and osteoporosis. The observed co-morbidities of Klinefelter syndrome per ICD-10 chapter is illustrated in Figure 16. Gene expression data was generated on KS and control males and integrated into the network. The analysis of the gene expression data itself showed that the genes escaping XCI in males affect gene expression genome wide. This clearly showed that the extra X chromosome in men with KS is expressional active to some extent and triggers the KS phenotypes. To investigate how KS and its co-morbidities are linked at the protein interaction level, a phenome-interactome network was build consisting of sub-networks each representing a co-morbidity of KS. Numerous proteins in the network linked multiple KS co-morbidities either by being associated to numerous co-morbidities itself, designated multi-disease nodes, or linking multiple co-morbidities through first-order interaction partners, designated co-morbidity hubs. Thus, the network displays key players in the KS disease phenotypes.
 
Figure 16. Observed Klinefelter Syndrome (KS) co-morbidities displayed by ICD10 chapter. The KS code belongs to chapter XVII. The most frequently occurring co-morbidity is testicular dysfunction (chapter IV).
Disruptive phenotypic impact of interactome modularity. In order to investigate and rank the coordinated expression of components in protein complexes in the network we implemented a previously developed method for expression coordination quantification (Taylor et al Nat Biotechnol 27, 199–204 2009). Integrating the gene expression data and using this methodology our analysis reveals that the structure of the interactome is likely to drastically decrease the phenotypic effect of KS.
The gene expression data from whole blood was used to depict disrupted protein complexes in KS. Next step will be mapping cell–type specific transcriptome data onto the phenome-Iinteractome network. The method we have developed makes it possible to investigate whether the stoichiometry in protein complexes increase or decrease the phenotypic impact of the de facto copy number variation. The approach we have developed is entirely general and can be applied to any case where gene dosage changes occur, such as X-inactivation, cancer induced aneuploidies or inherited copy number changes observed in individuals.
Potential Impact:
Expected final results and potential impacts
Overall SyBoSS aimed to gather and integrate systematic datasets into a comprehensive network model that will enhance the understanding of the regulatory circuitry underpinning a mammalian cell type, in particular a stem cell and more particularly an embryonic stem cell and its transition to other multipotent, self-renewing stem cells. By the end of the project, we have succeeded in assembling an unrivalled resource of primary information regarding the regulatory composition of a single, physiologically relevant, mammalian cell. It will take more time to integrate the very substantial volumes of data now available for ESCs into sophisticated regulatory models. Indeed model building inevitably must be an ongoing process of refinement and development. In this context, the ESC data resource is a premier venue for creative and constructive advances in the strategies for regulatory modelling of complex networks. In addition to advancing knowledge about ESCs, the SyBoSS project delivered substantial progress in the understanding of stem cells per se. In part because of the comprehensive datasets available, ESCs are now a paradigm for stem cells. SyBoSS also delivered further insights into pluripotency, stem cell self-renewal as a reinforcing regulatory network, destabilization of self-renewal and exit from pluripotency, the transition to the neighbouring quasi-pluripotent EpiSC state that nearby state of self renewalelf-as well as created a mammalian network of predicted protein-protein interactions as well as reference data sets for proteomic, transcriptomic and epigenomic profiles. Using well established computational methods, the growing body of systematic data from wild type and gene specific sources is being integrated with available public information into models and network simulations to infer the molecular interactions relevant for the four stem cell stages and the transitions between them. As these simulations grow, the value of the transcriptome data becomes ever more apparent. The implementation of our two technical remedies and the resulting acquistion of the transcriptome data is having a synergistic effect on the value, quality and accuracy of the overall outcome of the SyBoSS project.
The final outcome of the SyBoSS project is the establishment of an information resource pertaining not only to stem cell biology but also to mammalian cell biology in general with particular focus on transcriptional and epigenetic circuitry. This outcome is still under construction but the vast majority of the requisite data has been collected.
In the course of the project we developed and extended the methodology of mammalian systems biology. This will directly benefit other applications as well as serve as a logical starting point for the comprehensive understanding of mammalian development as a transcriptional program.
Through enhanced understanding of stem cell biology, the potential for stem cell applications in regenerative therapies and medicine will be enhanced. The information that SyBoSS has gathered and is organising will be important for developing standards for cell based therapies and devising new approaches based on improved understanding of regulatory interactions. This information will also impact on improved disease modelling, drug screening and toxicity tests using ESC and iPSC cell culture and differentiation. These models reduce the need to use animals for testing and in certain applications are more precisely defined and permit greater accuracy.
The broadest socioeconomic impact of the project relates to the development of new ways to treat chronic disabilities in the human population, particularly those related to ageing, as well as genetically based disabilities or degenerative diseases. These health issues are beyond current therapies. The information delivered by the SyBoSS project will contribute to the understanding required to search for new solutions.
Humans are the most complex phenomena in the universe. It is not surprising therefore that we are far from understanding much of ourselves in health and disease. We are based on at least 30,000 genes encoding at least 70,000 proteins and likely 10,000 non-coding RNAs whose temporal and environmental flexibilities add several dimensions to the complexity of the organism. This lack of understanding also applies to our understanding of even one cell type. Despite considerable progress, of which SyBoSS has made a hefty contribution, it will take a great deal more work and creative analysis before we arrive at a comprehensive grasp of stem cellness and predictable ways to employ these properties for applications in human health. Nevertheless the recent reactionary policies to return to the old ‘trial and error’ empiricism of medical research, now termed ‘translational research’ is very discouraging. The motivation to apply progress in the understanding of biological systems to applications in medicine is certainly good. However to abandon primary research when so much knowledge remains to be gathered and to return to the traditional guesswork of medical research is a very ineffective way of making progress. Under the current research funding policy climate, not only will a great deal of funding be wasted but also the highly skilled and developed European research environment, much of which is the very successful and constructive product of the 6th and 7th Framework Integrated Project policy, will be lost. Outstanding European collaborations, previously fostered by enlightened EC funding, are now disintegrating. It is extremely disheartening to see European biomedical research flounder in this way. For these reasons, I have my doubts about whether the vital, essential and remarkable progress that has been made in the fundamental understanding of mammalian development and stem cells will lead to effective exploitation.
List of Websites:
syboss.eu
Prof A Francis Stewart
Scientific Coordinator
Systems Biology of Stem Cells and Reprogramming
Director, Biotechnology Center
Department of Genomics, Technische Universitaet Dresden
Tatzberg 47-51
01307 Dresden
Germany
tel: +49-351-46340130
fax: +49-351-46340143
stewart@biotec.tu-dresden.de
						
                        
                        					
                    
                    
                    
                    
                    
                                        
                    
                                        
				Stem cells offer great potential for innovations in medicine through the development of patient-specific therapeutical applications, as non-animal models for understanding disease mechanisms and as venues for drug tests. To harness the potential, we need to understand these remarkable cells. Stem cells have the capacity to self-renew and also to differentiate into more restricted cell types. Recent methodological advances in systems biology developed in part by the SyBoSS partners, accessed the characteristics of stem cells with unprecedented precision thereby permitting the construction of highly accurate models. Through the collection of both ‘top-down’ datasets that report total cellular profiles and responses to mutagenesis or environmental perturbations, the application of genome-wide loss-of-function screens for unbiased functional identification and ‘bottom-up’ data collection from selected genes acquired using conditional mutagenesis, protein tagging and ChIP-sequencing. Hence the SyBoSS project gathered systematic data to build a systems biology understanding of selected stem cells. We focused on pluripotent embryonic stem cells (ESCs) and their transition to multipotent epiblast stem cells (EpiSCs) and then on to the tripotential neural stem cells (NSCs). SyBoSS collected data to understand the process of self-renewal in these three stem cell states as well as the transition between these states. Understanding the regulatory framework of any living cell, whether bacterial, single cellular or multi-cellular eukaryote remains a substantial challenge. Stem cells, which can invoke precise programs to shift from one cell state to several others, are even more complex. However understanding the remarkable properties of stem cells is a pre-requisite for fully employing their potential. Using the advantages of the ESC-EpiSC-NSC transition as our model venue, SyBoSS has laid the foundations for the understanding of stem cell potency in unprecedented detail with particular focus on the regulatory networks that secure self-renewal and promote the transition of one state into another.
Project Context and Objectives:
A major challenge in current stem cell biology is to elucidate how gene regulatory circuitry is modified to execute differentiation. Mouse embryonic stem cells provide a tractable system for addressing this problem because they may be stably propagated as homogeneous populations and released into differentiation in defined conditions. To identify genes that regulate transition from the naïve self-renewing ES cell to a differentiation committed state, SyBoSS was based on three broad platforms;
(i) the establishment of reference datasets for the 3 stem cell states; specifically two variations of embryonic stem cells (ESCs, 2i + LIF and serum + LIF), epiblast stem cells (EpiSCs) and neural stem cells (NSCs). The first publication reporting reference datasets has already become a citation classic (Marks et al, Cell 2012). We applied next generation sequencing to examine the transcriptome of ES cells cultured in ground state conditions (known as 2i + LIF) compared with conventional relatively heterogeneous cultures in serum + LIF. We found that ground state ES cells exhibit lower expression of lineage-affiliated genes, reduced prevalence at promoters of the repressive histone modification H3K27me3, and fewer bivalent domains, which are thought to mark genes poised for either up- or downregulation. Nonetheless, serum- and 2i-grown ES cells have similar differentiation potential. Precocious transcription of developmental genes in 2i is restrained by RNA polymerase II promoter-proximal pausing. These findings suggest that transcriptional potentiation and a permissive chromatin context characterize the ground state and that exit from it may not require a metastable intermediate or multilineage priming.
As intended, SyBoSS acquired total transcriptome, small RNA, proteome and phosphoproteome datasets from ESCs, EpiSCs and NSCs. The datasets will be made publically available in 2016 as we finalize their integration into an accessible resource and publication to add value regarding the ESC to EpiSC to NSC transition.
(ii) genome-scale screening approaches to identify functional players in an unbiased manner. The original goal to perform three RNAi genome-wide screens for unbiased discovery of factors involved in various aspects of pluripotency has been successfully exceeded. Five genome-wide and two other medium scale screens have been completed. The five genome-wide screens involved the use of the esiRNA method developed by partner Buchholz to screen ESCs and EpiSCs primed with fluorescent reporters to evaluate (a) the regulation of Oct4 in EpiSCs, which permitted the comparison with Oct4 regulation in ESCs; (b) the negative regulation of the meiotic specific gene, SMC1b; and (c) the roadblock to reprogramming in EpiSCs. Beyond these three screens, (d) a genome-scale siRNA screen in ESCs to identify negative regulators of the exit from pluripotency and (e) a saturation screen using piggy-Bac transpositional mutagenesis in haploid ESCs were also encompassed. In addition, (f) an esiRNA screen against 512 lncRNAs to discover lncRNAs involved in ESC self-renewal and (g) an esiRNA screen against 540 selected chromatin regulators using a reporter for X-chromosomal reactivation in EpiSCs were completed. All these screens have been extremely rewarding and led/are leading to the identification of a variety of factors and processes related to pluripotency. So far four key publications have emerged with at least two more under construction.
(iii) the collection of data from several hundred genes selected for their relevance to stem cell self renewal and transitions, often selected because of the genome-wide screens. The data was collected using tagging, RNAi or conditional mutagenesis methods developed at least in part by SyBoSS partners. Using GFP (or Venus) as the tag, we developed generic methods for standardized imaging, AP-MS and ChIP-seq (when a chromatin protein), which delivered uniformly high and comparable data quality. Hence we obtained a unique composition of datasets with unrivalled relevance for stem cell properties. The AP-MS data have been organized into a user friendly format available at syboss.eu and http://www.digtop.de/syboss_login.php(si apre in una nuova finestra).
In addition to these three cornerstones, SyBoSS also had a concentrated focus on
(a) X-chromosome inactivation, which is linked to the exit from self-renewal. Notably we discovered that the presence of two active X chromosomes delays exit from self-renewal and more recently that the inactive X lacks topologically associated domains (TADs) that characterize autosomes, except around the few genes genes that escape X-inactivation.
(b) technology development. In addition to the technical advances that were incorporated into the project in the first two years, we recently developed the auxin degron, together with rapid CRISPR/Cas9-assisted targeting methods, for functional analyses. The auxin degron brings two advantages over existing methods for ligand inducible loss-of-function. It is reversible and considerably faster. Loss-of-function within 30 minutes promises to open many new insights into regulatory function. Another aspect of technology development involved the optimization of proteomic methods for top-down mapping of ubiquitinylation sites with the aim to quantify ubiquitylation signaling in stem cells.
Value has been added to the data generated and collected by SyBoSS in several ways. The AP-MS and imaging data have been incorporated into a user friendly database that will be made public as soon as we have ironed out the glitches and incorporated the transcriptome data. A database assembling all publically available, ESC relevant ChIP-seq data has been established as the platform into which SyBoSS ChIP-seq data has been incorporated. This represents a substantial resource of genome binding patterns and histone modifications that will be made publically available concomitant to the forthcoming publication. SyBoSS data analysis included (i) the construction of models for the ESC to EpiSC to NSC transition; (ii) the utilization of ChIP-seq data to predict regulatory circuitry; (iii) integration of SyBoSS protein interaction data into existing protein-protein interaction datasets to establish a high confidence set of predicted interactions; (iv) development of standards for co-operative regulatory interactions for systems modelling; (v) the analysis of the dynamics of transcriptional networks driving the differentiation of ESCs; (vi) the evaluation of the functional implications of monoallelic expression, particularly with respect to X-chromosome inactivation, escape from inactivation and Klinefelter syndrome.
Figure 1. Current summary of the regulatory circuitry sustaining pluripotency in mouse ESCs.
In summary, our studies have challenged the prevailing model that ES cells enter differentiation via degeneration into a metastable population that experience stochastic lineage priming while co-existing as ES cells. Our data have instead revealed that, in controlled and defined conditions, ES cells undergo a highly orchestrated transition in which the naïve pluripotency network is abruptly dismantled by concerted action of multiple destabilising mechanisms (Figure 1).
Notably extinction of the naïve gene regulatory network occurs prior to detectable expression of definitive lineage specification genes or lineage priming. These observations are consistent with what is known of pluripotency progression in the embryo.
Based on these findings we have proposed that pluripotency may be parsed into three phases; naïve, formative and primed (Figure 2).
Figure 2. Schematic of current knowledge showing stages of pluripotency and exit.
Overall, SyBoSS achieved its work commitments in terms of deliverables. Please see the 5th annual report for more details about the final deliverables. Adoption of the mid-term reviewer’s recommendation to exchange quantity for quality made a significant contribution to the value of the outcome. Beyond the operational success, the more notable success has been the scientific accomplishments. A great deal of progress regarding pluripotency, ESCs and the transition through EpiSCs to NSCs, as well as the properties of stem cell self-renewal and differentiation has been made. As well as being amongst the most valuable, ESCs are now arguably the best understood mammalian cell type. The chemistry within the consortium has been excellent and a variety of unanticipated collaborative projects arose because of the complementary interdisciplinarity of the partners. Ongoing projects and relationships have been forged that will ensure that the SyBoSS legacy lasts well beyond the end of the funding period. As a final note, the cost neutral six month extension made a huge difference to the successful outcome and all SyBoSS partners are extremely grateful for this concession, as well as the entire overall support and opportunity conveyed by the funding.
Project Results:
The SyBoSS project benefitted greatly from the cost neutral six month extension and all partners would like to thank the project officer and colleagues for their enlightened management of our project.
Here we summarize the work towards the remaining deliverables.
D1.2-3 135 EpiSC and 135 NSC cell lines. An Excel file listing these cell lines, including a list of the 261 ESC lines expressing tagged proteins and 133 iPSC lines, was delivered. We achieved 137 NSC lines but fell short with only 91 EpiSC lines. However we also generated 133 iPSC lines by Cas9 targeting.
D1.1-3 100 AID biallelically tagged ESCs. An Excel file listing 89 ESC lines carrying biallelically targeted genes plus 14 where we only got monoallelic targeting (ie heterozygous) and a further 28 genes that are still underway at the time of writing.
D2.1-2 Western and/or in situ analysis of 600 cell lines. More than 600 cell lines have been imaged and the SyBoSS imaging database is available through the SyBoSS website, syboss.eu. Then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public).
D2.2-2 Identification of protein interactors generated from 300 AP-MS analyses. The outcome of more than 600 AP-MS analyses resulting in 115 successful protein-protein interaction datasets is also available through the SyBoSS website, syboss.eu. Then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public). We have also developed a new Volcano plot tool for user friendly statistical presentation and quantification of the AP-MS data.
D2.3-4 ChIP-seq and target gene analysis is available at
http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public.
This database will be made public when the accompanying manuscript, which is under construction, is accepted for publication. The accompanying manuscript will extend and enhance the of the primary data acquired.
D2.4-3 RNA-seq after esiRNA knock-down of 100 genes from revised D1.1-3 and D2.2-2 and target gene analysis. The list of 117 genes knocked down in both ESCs and EpiSCs has been delivered as an Excel file including notes about whether the knock-down cells displayed a visible phenotype. Because the experiments were performed in both ESCs and EpiSCs, we exceeded the requirement for this deliverable by more than 2 –fold. There is a considerable number of ways in which these data can be analysed. Consequently the target gene analysis is still underway.
D3.4-3 Fine grained model for subset of genes. The report has been submitted.
D3.1-3 NGS database – implementation of SyBoSS-NGS-datasets in webbased, downloadable, genome browser linked database. Is available at
http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public.
D4.2-1 gene lists from three genome-wide RNAi screens & Report on protein-protein interactions. Lists of genes with Z scores above 2 for scoring three genome-wide screens has been delivered as an Excel file. In addition we performed two further genome-wide screens and two medium scale (i.e. ~ 500) screens during the SyBoSS project. Please see the final report (section ‘Screens’) for further description as well as the report on protein-protein interactions (beginning of the section ‘Modelling and computational biology’. This work was also published in Ding et al, Cell Systems 1, 141-151 and would have been included in the deliverable however only one file can be uploaded per deliverable).
Data collection - proteomics
Embryonic stem cells are highly plastic and can be differentiated into more specialized stem cells, which can be further differentiated into specific lineages. Stem cells are complex systems where the identity and functional differences among different stem cell types is determined by the differences in their proteome complement, protein posttranslational modifications (PTMs) and protein-protein interactions. SyBoSS systematically collected proteome and phosphoproteome datasets of the three different stem cell types (ESC, EpiSC, and NSC), as well as identified interaction networks of key stem cell-associated proteins.
We used high-resolution mass spectrometry (MS) in combination with stable isotope labeling by amino acids in cell culture (SILAC)-based quantification for the relative quantification of the proteomes and phosphoproteomes of ESCs, EpiSCs and NSCs (Figure 1A). We quantified over 9,000 proteins in these analyses and over 10,000 phosphorylation sites were quantified in each cell type, providing a deep systems-wide comparison of proteomes and phosphoproteomes of these cell types. While the fraction of proteins down-regulated in EpiSC and NCS compared to ESC were relatively similar, notably, a much larger fraction of the proteome was upregulated in NSCs (Figure 3). Also, the fraction of upregulated proteome and phosphoproteome was comparable for NSC; whereas a larger fraction of phosphoproteome was differentially regulated between ESC and EpiSC compared to their differences in the proteome expression. We are currently finalizing a manuscript including these data, and thereafter will make them available to the community. In addition to the total phosphoproteome analysis, we also investigated the dynamics of phosphorylation sites in ESCs treated with Cisplatin. This research activity was not originally part of the SyBoSS plan but has proved to be a useful complement. The results showed that the phosphoproteome of these stem cells is extensively regulated in response to genotoxic insults.
MEK-ERK signalling stimulates ES cells to transition out of naïve pluripotency and enter the path to lineage commitment. In a second series of phosphoproteomic analysis, we identified targets of the ERK signalling cascade in undifferentiated ES cells utilizing SILAC coupled with phosphopeptide capture. This approach identified RSK1 as a prominent direct target or ERK in ES cells. RSK effects may include negative regulation of ERK activation. We therefore used CRISPR/Cas9 to create combinatorial mutations in RSK genes. Genotypes that included null mutations in RPS6ka1, encoding RSK1, resulted in elevated and sustained ERK phosphorylation. We found that these mutants exhibit altered differentiation kinetics. RSK-depleted ES cells show earlier down-regulation of naïve pluripotency factors,
Figure 3. (A) Schematic for the quantification of proteome and phosphoproteome of ESC, EpiSC, and NSC. The ESC, EpiSC, and NSC were cultured in “light” (Arg0, Lys0) or “heavy” (Lys8, Arg10) isotope labeled amino acids media. Cells were lysed and protein extracts were digested with trypsin. A small fraction of the peptides were used for the measurement of the proteome, and the remaining peptides were used for enriching phosphorylated peptides. The samples were analyzed by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) and the raw data were analyzed by the MaxQuant software (B) The bar chart shows the fraction of proteins and phosphorylation sites that were up- or down-regulated in EpiSC and NCS relative to ESC. The data shown are combined from two independent biological replicates.
precocious expression of transitional epiblast markers, and early onset of lineage specification. We further showed that chemical inhibition of RSK increases ERK phosphorylation in ES cells and expedites entry into differentiation. These findings demonstrate that the level of ERK signalling influences the dynamics of ES cell differentiation and highlight the role of signalling feedback in developmental progression. This work is an extension of studies initiated in the FP7 project EuroSyStem. The results are currently being prepared for publication.
To further understand the wiring of protein interaction networks in ESCs, we analyzed protein-protein interaction of selected proteins that are implicated in establishing or maintaining proper stem cell identity. For this work, we used “label free” quantification strategy. We have performed affinity purification mass spectrometry (AP-MS) analysis of 396 samples. These data have been entered into the SyBoSS database. Additionally, we investigated protein expression changes occurring during X chromosome inactivation. We performed proteomic analysis of two different ESCs with one or two active X-chromosomes, XO and XX respectively, using a SILAC-based quantitative proteomic strategy. The ESCs were labeled with light and heavy SILAC media in both forward and reverse combinations. This analysis identified over 4,000 proteins in total, of which a small subset were differentially expressed between XX and XO cells.
Data collection – ESC engineering and transcriptome profiling
In the SyBoSS project, 342 ESC lines encompassing 287 tagged genes were made either by knock-in targeting (110), BAC transgenesis (177) or both (55). These lines were evaluated for tagged protein expression by Western blot, immunofluorescence (both using a goat anti-GFP antibody), tagged GFP/Venus fluorescence and AP-MS. We found that Western was the least reliable of the methods and was therefore discontinued. As recommended by the referee at the mid-term review, instead of attempting AP-MS for 300 genes we should double our efforts on half this number in order to improve the quality and success rate. In total, 216 tagged ESC lines comprising 161 genes were evaluated by AP-MS, most of them twice, totalling well over 600 analyses. Hence SyBoSS work on AP-MS exceeds our revised commitment. Of the 161, 63 were targeted and 114 were BAC transgenes (16 were both). Successful AP-MS results were achieved with 115 genes. At the time of writing, a further 30 genes are being analysed by AP-MS, consisting of 20 repeats and 10 new genes. AP-MS and imaging results are accessible through the SyBoSS website, syboss.eu and then select ‘Internal area’ (http://syboss.eu/internal-area.html(si apre in una nuova finestra)). Then select SyBoSS-database at Helmholtz-Zentrum (http://www.digtop.de/syboss_login.php(si apre in una nuova finestra)). Username is sybosslogin and password is SuperHero (the database is not yet public). The transcriptome profiles are available at GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=szgfueqinhsnvax&acc=GSE77692(si apre in una nuova finestra)
Transcriptome analysis of RNAi knockdowns in mouse ES and EpiS cells.
To functionally evaluate the roles of ~100 genes identified in the genome-wide RNAi screens, we characterized transcriptional changes in ESCs and EpiSCs following esiRNA knockdown. 117 candidate genes and 1 control gene (Luciferase) were treated with esiRNAs supplied by Partner Buchholz and total RNA was collected from two biological replicates 72 hours post-transfection. As noted in the gene list (D2.4-3) at 72 hours, morphological phenotypes were obvious for 15 genes in ESCs and 29 genes in EpiSCs (4 genes in both). The RNA concentrations were normalized and arrayed in 96-well plates. Bar-coded libraries were prepared from each 96-well plate and subjected to RNA-seq (Hiseq V4, 75 bp pair-end reads). The data is available by searching for ERP013675 at http://www.ebi.ac.uk/ena(si apre in una nuova finestra).
ChIP-sequencing.
The ChIP-sequencing database is available at (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public).
Screens
During the course of the project, SyBoSS committed to three genome-wide loss-of-function screens. Because the screens proved to be extremely fruitful and effective, we have exceeded this commitment considerably. Five genome-wide screens and two medium scale screens have been successfully completed (D4.1-2).
1. Oct4-GFP screens for pluripotency. Oct4 is critically involved in maintaining pluripotency in stem cells, and the changes in its expression level cause differentiation of the cells. Hence, Oct4 expression is a valuable reporter for the status of pluripotent cells. One of the starting points for the SyBoSS project was a genome-wide esiRNA screen using an Oct4-GFP reporter ESC line to identify candidate regulators of Oct4 expression and ESC self-renewal (Ding et al, Cell Stem Cell 4, 403-15, 2009). The high confidence genes were included in the initial SyBoSS gene list for detailed analysis. To gain a systematic understanding of the genes associated with EpiSC identity, we performed a very similar genome-wide esiRNA screen in Oct4-GFP reporter EpiSCs (Ding et al, Cell Systems 1, 141-51, 2015). Our screen uncovered genes that are specifically required to maintain Oct4 expression in EpiSCs together with numerous genes that alter Oct4 expression in both cell types. Surprisingly, beside the identification of shared factors required to maintain Oct4 expression in both cell types, our analyses also revealed numerous knockdowns that lead to increased Oct4 expression solely in EpiSCs. This result indicates that, in contrast to ESCs, Oct4 is under active repressive control in EpiSCs, thereby establishing a fundamental difference in Oct4 regulation in these two pluripotent cell types (Figure 4).
Fig. 4 Comparative analysis of the screen results in EpiSCs and in ESCs. The y-axis represents the average Z-scores for the GFP intensity for each targeted gene. Up-regulated (Z-score >2) or down-regulated (Z-score <-2) Oct4 expression is depicted in yellow and blue, respectively. Note the large number of knockdowns that up-regulated Oct4 expression in the EpiSCs screen.
Experiments to analyse the esiRNA screen in EpiSC were completed. A multiparametric integrative analysis of the RNAi screen with protein localization, genetic interaction and protein-level dependency was performed. This analysis predicted that Tox4 exhibits similarities to components of the Paf1 complex. Physical interaction of Tox4 with Ctr9 was confirmed by tagging and proteomic analyses. This analysis also revealed interaction of Tox4 and Ctr9 with components of the PP1 phosphatase complex.
2. siRNA screen of exit from the ES cell state. Using an assay based on the recovery of pluripotency after removal of 2i+LIF, we screened nearly 10,000 genes in duplicate experiments with pools of four independent siRNAs. We validated 28 genes whose knockdown significantly impeded progression from the undifferentiated ES cell state upon transfer from 2i. In addition to members of known critical pathways we found the tumor suppressors Folliculin (Flcn) and Tsc2. Tsc2 lies upstream of mammalian target of rapamycin (mTOR), whereas Flcn acts downstream and in parallel. Flcn, with its interaction partners Fnip1 and Fnip2, drives differentiation by restricting nuclear localization and activity of the bHLH transcription factor Tfe3. Conversely, enforced nuclear Tfe3 enables ES cells to withstand differentiation conditions. Genome-wide location and functional analyses showed that Tfe3 directly integrates into the pluripotency circuitry through transcriptional regulation of Esrrb. These findings identified a cell intrinsic rheostat for destabilizing naive pluripotency and allow transition into differentiation. Congruently, stage-specific subcellular relocalization of Tfe3 suggests that Flcn-Fnip1/2 contributes to developmental progression of the pluripotent epiblast in vivo. These results were published in Betschinger et al, Cell 153, 335-47, 2013.
3. esiRNA screen for reprogramming from EpiSCs to ESCs. To identify genes constituting reprogramming roadblocks we made use of an EpiSC line expressing a chimeric GCSF-LIF receptor (Yang et al Smith A, Cell Stem Cell 7, 319-28, 2010). EpiSCs are resistant towards conversion to naive pluripotency. However, addition of GCSF to this EpiSC line exhibits a low frequency of reprogramming to ground state pluripotency. These cells further contain an Oct4-GFP-IRES-Puro selection cassette that allows, in combination with 2i culture conditions, stringent selection of naive pluripotency. Thus, chimeric-LIF-receptor expressing EpiSC provide a sensitized screening system to identify genetic barriers to reprogramming. Knockdown of STST3 was used as a negative control and knockdown of Zfp281 was used as positive control.
4. SMC1b-GFP screen for repressors of the meiotic gene expression program in ESCs. ESCs not only maintain pluripotency through the Oct4 regulatory circuitry but also through repression of inappropriate gene expression. To unlock this area of regulation, a genome-wide esiRNA screen using the meiotic-specific SMC1b-GFP as the reporter and identifying derepression through increased GFP expression. Notably the transcription factor E2F6 was used as a positive control to validate the assay.
Figure 5. Validation of the SMC1b-GFP assay.
5. Genome-wide haploid ES cell mutagenesis screen of exit from the ES cell state. The development of haploid ES cells provides a powerful new platform for unbiased mutagenesis screens (Leeb et al; Cell Stem Cell, 14, 385-93, 2014). We therefore developed the methodology using piggyback transposition in haploid ESCs with the aim of implementing a saturation screen. After 40 independent screens, the recovery of new genes was very low, indicating approach to saturation. We have identified 310 candidate genes including most of the known players. To achieve rapid in-depth analysis of these candidates we have developed a pipeline for high throughput generation of knock-out ES cells using CRISPR and subsequent transcriptome analysis by RNA-seq. The development and implementation of this new program was enabled by SyBoSS involving the labs of Smith, Beyer and Stewart with the notable inclusion of Martin Leeb, former post-doc with Smith now running his own lab in Vienna. Because the project is a product of SyBoSS, this successful and ongoing collaboration is one of the leading highlights. We anticipate a high profile publication describing the ES cell transition in unprecedented breadth and molecular detail.
6. Three other moderate size screens have been included.
(i) Using the Oct4-GFP reporter in ESCs, knock-downs of 512 lncRNAs using esiRNAs was performed and three lncRNAs were identified as contributing to ESC self-renewal and/or Oct4 regulation. The methodology was published in Chakraborty et al, Nature Methods 9, 360-9, 2012. The detailed analysis of one of the lncRNAs, termed Panct1, is under review at Nature Structural and Molecular Biology. Interestingly this lncRNA is found in the first exon of a protein coding gene with which it co-operates.
(ii) After considerable difficulties, we finally established a female EpiSC line that carried Venus knocked onto the C-terminus of the X-linked gene, G6pdx, on the inactive X. This line was screened with esiRNA selected to knock-down 540 known chromatin regulators by sorting for Venus activation (Figure 6). Validation of the candidates is underway.
Figure 6. Summary of the screen for X-reactivation in EpiSCs. The 13 genes whose esiRNA knock down led to activation of the G6pd-X-Venus reporter with a Z score above 2 are labelled.
Technology development
Protein level dependency
We established a way to evaluate of protein-protein interactions using RNAi. For proteins that are subunits of the same complex, then reduction of one subunit often leads to reduction of the other subunits because the complex is destabilized. To test this idea, we used esiRNA transfections in ESCs and EpiSCs selected from our GFP-tagged cell line resources. The esiRNAs were chosen to knock-down a candidate partner of the GFP-tagged protein and GFP fluorescent levels were measured. This was applied to 28 BAC-tagged ESCs and EpiSCs with substantial results. We term this assay ‘protein level dependency’.
Auxin degron
Our work on the auxin degron and its excellent properties for rapid depletion of the target protein was thoroughly described in the 4th annual report. Since then we have developed methods for biallelic knock-in tagging using CRISPR/Cas9-assisted targeting, which permits high throughput targeting in ESCs (manuscript submitted).
Engineering homozygous mutant stem cells
Our attempts to scale up the production of homozygous mutant mouse embryonic stem cells were not successful. Serial targeting of both alleles using conventional gene targeting proved to be too inefficient and the project was abandoned and replaced with (a) esiRNA knockdown experiments in ESCs and EpiSCs The advent of CRISPR-Cas9 technology presented us with an opportunity to develop scalable methods for bi-allelic targeting of genes in stem cells. Over the past year, we developed a robust and efficient method for biallelic targeting of human induced pluripotent stem (iPS) cells. Our strategy for the generation of biallelic knockouts, shown in Figure 7, is to replace a critical exon of one allele of the target gene with a drug selection cassette by homologous recombination and to screen clones for damage the second allele induced by error-prone non-homologous end-joining (NHEJ).
Since homologous recombination is greatly stimulated by the action of site-specific nucleases, we reasoned that the drug-resistant clones would be enriched for cells that take up and express active Cas9 nuclease. Thus, we should expect to see a high incidence of NHEJ-induced damage to the second, non-targeted allele in clones that have undergone homologous recombination. Furthermore, only one copy of the target exon will be present in correctly targeted clones, thus, NHEJ-induced damage to the non-targeted allele can be assessed by Sanger sequencing of PCR products from the target exon. By definition, clones that exhibit a clear mutant read in the target exon will be biallelic events (targeted/NHEJ). Non-targeted clones will carry two copies of the target exon and indels in one or both alleles will not produce a readable sequence trace by Sanger sequencing. Therefore, our strategy provides a simple, scalable genotyping method for the rapid identification of homozygous mutant clones, obviating the need to characterize both alleles by sequencing of cloned PCR products or by single molecule sequencing.
Figure 7. Strategy for biallelic targeting of genes with CRISPR-Cas9 programmable nuclease. The diagram shows a short-arm targeting construct with ~1kb homology arms flanking a selectable gene (drug R, usually neo) introduced by nuclease-promoted homologous recombination with concomitant nuclease promoted damage on the other allele.
X-chromosome assay
To follow the process of X inactivation during differentiation and reactivation during reprogramming, we originally aimed to generate female ESC lines carrying two fluorescent reporters on each allele of the G6pdx X-linked gene. Differentiation into EpiSCs and NSCs should inactivate expression from one X, which could be followed by fluorescence. Subsequent cloning would produce a single colour cell line that could be reprogrammed with X-reactivation followed by re-expression of the other colour. As previously reported, we encountered severe technical problems in generating double Venus/Katuschka tagged female ESC lines at the G6pdx locus (in brief, all clones in multiple targeting rounds became XO loosing one X chromosome). As a remedy, we decided (see previous reports) to
(a) establish mouse lines via normal male ESCs carrying one or the other targeted allele. Crossing of these two lines resulted in female mice carrying the two knock-in alleles, from which we established EpiSCs and NSCs. One of the EpiSC lines has recently been used in an esiRNA screen to search for chromatin factors involved in X-reactivation; and
(b) generate female knock-in ESCs using a different parental female ESC line, TX1072 – generated from female hybrid embryos carrying a Tet-inducible Xist gene on one X chromosome (Figure 8; Schulz et al, Cell Stem Cell, 2014). We designed targeting constructs at multiple different loci (Figure 8A,B). The use of CRISPR-mediated approaches greatly accelerated the speed and efficiency of targeting. We have now successfully generated several XX ESC lines carrying GFP and Tomato reporters at the Huwe1 and G6pdx loci. The dual fluorescence in ESCs with two active X chromosomes shifts to mono-fluorescence upon Xist induction as expected (Figure 8C,D). The generation of lines tagged for the Mecp2 and Jarid1c loci are underway. The Huwe1 and G6pdx dual tagged ESC lines are currently being differentiated into EpiSCs and NSCs and will be used for esiRNA screens (in collaboration with F. Buchholz) to identify (a) the factors required for gene silencing and maintenance during X inactivation; (b) the factors required for reprogramming from the EpiSC or NSC state. These screens will be complementary to those performed using EpiSCs with G6pdx Venus/Katushka alleles generated by the Stewart lab from mice. In our case, we can assess XCI in ESCs, EpiSCs and NSCs. Furthermore, the parental ESCs carry highly polymorphic X chromosomes, enabling the precise timing of gene silencing and chromatin changes during X inactivation to be assessed by RNA-seq and ChIP-seq. Pilot data has been generated and bioinformatics analysis is underway.
The results of these studies, initiated in SyBoSS consortium, will provide novel insights into the dynamics of differentiation and reprogramming, as well as uncovering new actors in the processes of gene silencing and reactivation, using the XCI paradigm.
Figure 8. Generation of female ESC lines with GFP and Tomato tagged X-linked alleles
A. The overall strategy is illustrated. Female ESC lines carrying a Tet-inducible Xist gene on one X chromosome, and with GFP and Tomato reporters targeted into the endogenous loci of the G6pdx, Huwe1, Mecp2 and Jarid1c loci, can be used to induce XCI either in undifferentiated ESCs or during differentiation into EpiSCs and NSCs. This leads to non-random inactivation of one allele (either GFP or Tomato) and provides a readout for screens to identify factors that interfere with the process of X inactivation, or reactivation.
B. The targeting constructis used to introduce the fluorescent reporters at the endogenous loci using CRISPR/Cas9 facilitated targeting using short (500bp) homology arms.
C. An example of one Huwe1 GFP/Tomato ESC cell line before and after XIst induction.
D. qRT-PCR assessment of GFP and Tomato expression in several independent G6pdx GFP/Tomato ESC clones before and after Xist induction for 48h.
X-inactivation and small RNA analysis in ESC to EpiSC and NSC differentiation
Male (E14) and female (PGK12.1) ESCs were differentiated into EpiSCs and onto NSCs using standardized culture conditions defined in the consortium. The detailed analysis of the gene expression states during early ES to EpiSC differentiation led to the discovery that the presence of two active X chromosomes delays the exit from pluripotency (Schulz et al, ell Stem Cell 2014). The investigation of the chromatin status of the X chromosome during early ES to EpiSC differentiation also led to the discovery that Jarid2 is an early partner of the inactive X and a key factor for the recruitment of Polycomb repressive complex 2 (PRC2) to the Xi (da Rocha et al, Mol. Cell 2014). Recently, investigating X inactivation changes during EpiSC to NSC differentiation, we discovered that some genes become reactivated on the inactive X. That is, some genes are inactivated when the X is first inactivated and then become reactivated upon further differentiation. This is unexpected and now requires careful work to evaluate whether the ESC-EpiSC-NSC culture model accurately reflects action in the embryo.
In the course of our analyses of ESCs differentiated into NSCs, we noted that the X chromosome undergoes a series of interesting changes in structure and organization using allele-specific Hi-C (in collaboration with the lab of J. Dekker) and RNA-seq. We found that the Xi lacks typical autosomal features such as active/inactive compartments and topologically associating domains (TADs), except around a small number of genes that escape XCI and remain expressed. Escaping genes form TADs and retain DNA accessibility at promoter-proximal and CTCF binding sites, indicating that these loci can avoid Xist-mediated erasure of chromosomal structure. We also found that gene silencing competent Xist RNA is sufficient to induce segregation of the Xi into two ‘mega-domains’ separated by a boundary that includes the DXZ4 macrosatellite sequence, which can also be found on the human X. Deletion of this boundary prior to XCI results in fusion of the megadomains and altered patterns of escape that correlate with changes in TAD structure following differentiation and XCI. These results suggest a critical role for the boundary locus and Xist RNA in shaping the structure of the Xi and modulating escape from XCI. Our findings also point to roles of transcription and CTCF binding in TAD formation in the context of facultative heterochromatin. This SyBoSS publication is under revision at Nature (“Structural organization of the inactive X chromosome” L. Giorgetti, B. R. Lajoie, A.C. Carter, M. Attia, Y. Zhan, J. Xu, C.J. Chen, N. Kaplan, H. Y. Chang, E. Heard# and J. Dekker#). The description of Xi expression and chromatin status during ESC to NSC differentiation will be reported in a separate publication (integrating the ES to EpiSC data mentioned above).
Our investigation of small RNA populations in the ESC-EpiSC-NSC transition led to the discovery of small RNAs at the Xist locus specifically in female EpiSCs and in the Tsix locus in ESCs respectively. The timing of appearance during ESC to EpiSC differerentiation was investigated by small RNA-seq at 4, 7 and 10 days of differentiation, using male (E14) and female (PGK12.1) ESCs. The Xist small RNA population, and in particular one highly represented miRNA, appeared from day 4. To test whether this small Xist-derived RNA entity has any function in XCI or in differentiation, we are currently deleting it using CRISPR/Cas9 both in vivo (in mice) and in female ESCs.
In order to assess the timing of appearance of Xist small RNAs as well as micro RNA changes linked to differentiation into the epiblast lineage, we generated small RNA data from a range of ESC differentiated states including male ESCs, day 4, 7 and 10 differentiated to EpiSCs, and females ESCs (d4,7,10 differentiation) as well as two female EpiSC samples. mRNA profiles have also been generated from the same samples, and in duplicate for each time point. The integration of miRNA and mRNA profiles is a SyBoSS outcome that lead to the discovery of female specific miRNA profiles in early development. Functional evaluation of these differences is underway.
Modelling and computational biology
The functions of this Work Package within SyBoSS were (1) to generate community standards that facilitate the integrated computational modelling of cellular systems (especially mouse ES cells), (2) to develop computational methods for the analysis of data generated in this project, (3) provide input for second-tier experimental work, such as suggesting follow-up target genes, and (4) to generate new biological insight through computational modelling of ES cell dynamics.
This WP has achieved all of its goals, deliverables and milestones within time (see also the previous periodic reports). Thus, this WP has been very successful, which is underlined by the multiple interactions among WP3-partners and between the computational and experimental groups in the SyBoSS consortium. New collaborations have led to unexpected research directions, such as the functional genomics screen conducted under the lead of Austin Smith for the identification of genes involved in the exit from pluripotency. The work of Partner 11 (Univ. of Cologne, Beyer) has established the computational framework for the analysis of this data, which was not envisioned in the initial project proposal. Further, this WP has created community standards for reporting complex, combinatorial interactions between proteins in public databases (see below), which will have a long-lasting impact on the scientific community beyond the lifetime of SyBoSS.
Notably we integrated the loss-of-function scores, genetic interaction mapping, protein localization and protein-level dependency (PLD) into one model to delineate connectivity between factors that control Oct4 expression in EpiSCs and then compared the role of these factors to their function in ESCs (Figure 9). These data define shared and distinguishing factors in naïve- and primed- pluripotent cells, and provided insights into the dynamics that accompany the transitions between ESCs and EpiSCs. We demonstrated the power of this integrative approach by the prediction of Tox4 as an interacting partner of Paf1C (Figure 9).
Figure 9. Multiparametric integration of Omics data predicts Tox4 as Paf1 interacting partner. Graphical presentation of hierarchical cluster analysis of indicated Omics data. A binary distance metric was used for the localization data and an Euclidean distance metric was employed for all other data sets. Components of known protein complexes are highlighted with the same color.
Protein-chromatin interactions
The goal of this WP was to compile and analyse large-scale protein-chromatin interaction data in order to generate input data for subsequent modelling, especially in WP3.4 (see below). The core of this WP is the assembly of ChIP-sequencing data characterizing the binding patterns of transcription factors (TFs) to the genome, which subsequently was used to predict target genes regulated by these TFs.
Because of the heterogeneity of the published data, we decided to re-analyse all raw data. The reason for getting involved in such an intensive task was that if we wish to use the pre-existing, published data sets as benchmark and background data, we need to have these datasets processed and analysed using exactly the same tools and strategies. These re-analysed data are then integrated in a graphical and interactive manner, both in terms of data visualization and for further data analysis. For this purpose, we have implemented a SyBoSS-specific version of the UCSC Genome Browser coupled to a dedicated Galaxy server.
We have integrated 66 unique ChIP-Seq tracks, comprising 11 histone modifications, 1 acetyl-transferase, 1 lysine-transferase, 24 transcription factors, 1 co-factor, 7 ploycomb and trithorax, 3 members of the RNA-Pol II pausing complex, 3 members of the super elongation complex, 4 member of the cohesion complex, 2 members of the mediator complex, 4 RNA polymerase II WT and mutant proteins, 2 chromatin organization proteins, 1 DNA methylase with the addition of RNA, MRE and MeDIP-Seq tracks. All of these tracks have been analysed using identical pipelines and the resulting data are uploaded into our SyBoSS’ dedicated database (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public).
A very important subsequent step was the prediction of TF target genes from the ChIP-seq data. Inferring target genes from chromatin binding data of TFs is anything but trivial. The key problem is that TFs can bind far from their target genes and it is a priory unclear how to score the ‘binding pattern’ of a given TF around a gene. We have developed a scoring system that integrates the distances of binding sites from potential target genes in a target score. This score accounts for multiple binding events in the proximity of the same gene and it accounts for TF-specific features, such as how far from promoters a given TF typically binds. This scoring was compared to various other popular scorings using independent information, such as target gene function or expression data from TF knock-out experiments. This method comparison revealed that naïve target calling that simply checks for the presence or absence of TF binding within a pre-defined window around the promoter performs poorly compared to more sophisticated scorings that also account for the distance and number of binding events. This work was published in 2013 (Sikora-Wohlfeld et al. PLoS Comp. Biol. 9(11): e1003342, 2013).
Protein-protein interactions
Protein-protein interaction data is essential for the data integration for the elaboration of SyBoSS modelling. However, the number of protein-protein interactions that have been genuinely measured in mouse cells is extremely low (< 500 interactions) compared to other species (yeast and human have known interactions in the range of tens of thousands). However, even in the case of human we know that the existing experimentally validated interactions only cover a small fraction of the whole human interactome, which is predicted to have more than 50,000 interactions. Thus, there is a clear need for the computational prediction of interactions in general, but particularly in mouse.
Due to the evolutionary proximity we have decided to infer a physical protein interactome for mouse using respective resources developed for human proteins. In particular we (Beyer group, TUD) have developed machine learning methods for predicting high-quality protein interactions in human. The new human interactome contains more than 100,000 high confidence interactions, the majority of which are newly predicted. Several hundred predictions have been experimentally tested within the SyBoSS consortium (Tony Hyman) and using external partners (Matthias Mann, MPI Martinsried, Germany, Ulrich Stelzl, MPI Berlin, Germany). Such extensive experimental validation of a database predicting protein interactions is unprecedented in the published literature. The human network has been published in an international peer reviewed journal (Elefsinioti et al. Molec. Cell. Proteomics 10(11):M111.010629 2011) followed by a second paper presenting a new computational approach that we developed for this purpose (Sarac et al. 2012 Bioinformatics 28(16):2137, 2012).
The next steps in this WP has been to develop methods for the transfer of this network to mouse utilizing advanced orthology determination algorithms. The mouse network is thus composed if (1) interactions directly measured in mouse/mouse cells, (2) interactions measured in human cell lines and (3) interactions based on computational prediction. Further, we used an improved method for utilizing protein domain information for the prediction of interactions. The new scoring that we used separates known protein-protein interactions from the rest very well (Figure 10). The new integrated mouse network contains 9892 novel physical interactions with high confidence.
Domain Interaction Score
Figure 10. Density distribution of domain interaction scores for 15,659,097 possible mouse PPIs with a domain interaction score greater than zero. Known physical interactions tend to have higher domain interaction scores, which confirms the predictive power of this score.
Cooperativity
We developed standards for cooperative regulatory interactions. The project was forward looking in the sense that the main benefit of these standards will be felt after the end of the grant period, as researchers learn to systematically collect the appropriate data. It has become clear that Cooperativity is essential for biological complexity (Gibson, Cell regulation: determined to signal discrete cooperation TiBS 34, 471-82, 2009). Both multivalency and allostery enable multiple state inputs to determine a single execution step. Regulatory protein complexes will not be correctly modeled without taking account of cooperative interactions. We had the simple, clear but vital objective to introduce cooperativity into molecular systems resources for the benefit of the SyBoSS consortium stem cell research and more broadly for systems modelers everywhere.
Proper cell physiology depends on numerous molecular interactions and analyzing these interactions is a prerequisite for understanding cell function and regulation. Several publicly available molecular interaction databases exist that provide experimentally validated and manually curated molecular interaction data to the scientific community, and as such make an important contribution to scientific research. However, the interactions are all treated as binary and independent whereas, within cells, molecular interactions are generally not independent but cooperative, i.e. they influence each other positively or negatively, an aspect that was insufficiently and inconsistently captured in bioinformatics resources but is critical for reliable and robust cell signalling. Within the SyBoSS project, we tackled this shortcoming by setting out to integrate cooperative interactions in bioinformatics resources. We also provided the first publicly available resource having cooperative interaction data available for analysis and in a computer-readable standard data format. The open standards that we have introduced provide reference platforms that will enable bioinformaticians to further develop computational resources for helping to advance knowledge on the molecular details of cooperative binding and understanding of cell regulation in general.
In our first task, we developed a standard format for cooperative interaction data (D 3.3-1). The first version of this standard is an extension of the current data format for molecular interaction data, the PSI-MI2.5 XML format, and uses new controlled vocabulary (CV) terms, which were added to the PSI-MI CV ontology, to describe cooperativity between distinct binding events. We also developed a website that describes in detail how to annotate cooperative interaction data using the PSI-MI2.5 XML format and provides several examples of different complexity (http://psi-mi-cooperativeinteractions.embl.de(si apre in una nuova finestra)). The use of the PSI-MI2.5 XML to capture cooperative interaction data has also been published (Van Roey et al., 2013; PMID: 24067240). Standards development is a dynamic process, and the molecular interactions group of the PSI consortium continued with the development of a major revision of the molecular interaction data exchange format, PSI-MI3.0. We were actively involved in this development, specifically in adding and deeply embedding elements that improve annotation of cooperative interactions, making it more inherent to the format. To this end, we continued our collaboration with the group of Henning Hermjakob at the EBI, the lead developers of the PSI-MI standard. A pre-final draft version of the new PSI-MI3.0 format was composed at the 2014 PSI spring meeting (April 13-16, 2014, near Frankfurt, Germany) and since then has been out for review.
After developing the standard, it was important that we showed that it could be applied in a bioinformatics database. Therefore we next developed the switches.ELM resource (http://switches.elm.eu.org(si apre in una nuova finestra)) a database for experimentally validated cooperative interactions curated from the literature (D 3.3-2) (Van Roey et al., 2013; PMID: 23550212). Most of the data currently curated in switches.ELM involves interactions mediated by short linear motifs (SLiMs), low-affinity interaction modules that are frequently used cooperatively to function as molecular switches. This bias stems from SLiMs being our area of expertise; however, any set of molecular binding events affecting each other can be captured, and in a continuous curation effort we are also annotating these more general cooperative interactions in switches.ELM. In addition, we kept working on improving visualization and data export for the new PSI-MI3.0 format once it is released. We were then able to apply the lessons learned from switches.ELM by working with the IntAct Consortium, to revamp the existing binary interaction database IntAct and make it cooperativity aware (Orchard et al., 2013; PMID: 24234451). As part of the collaboration, we organised several meetings involving the three bioinformatics groups of SyBoSS and our external collaborators, including a successful and valuable Hackathon. A final task to achieve our milestone (M 3.3) and get to a cell signalling and regulation systems-ready cooperativity tool set is defining a guideline to unambiguously report cooperative interaction data (D 3.3-4). For this purpose, we have defined the Minimum Information About Cooperative Interactions (MIACIs) (D 3.3-3) which can be found on our cooperative interactions website (http://psi-mi-cooperativeinteractions.embl.de/CIS-MIACIs.html(si apre in una nuova finestra)).
We were able to complete our original set of deliverables within the expected time. Work continued on improving the new bioinformatics resources but, at the SyBoSS midterm meeting, we also added a new deliverable (D 3.3-4) for a project with Edith Heard, Partner 6. This relates to her group’s findings that numerous developmental genes show random monoallelic expression (RME). One curiosity is that not all paralogous members of multigene families are monoallelic. From a gene list provided by SyBoSS partner 6, we have constructed phylogenetic trees using family member protein sequences. The goal is to assess if there is a correlation between the conservation/speed of evolution of proteins that belong to the same protein family and the severity of phenotypic effects upon inactivation of these proteins. By building phylogenetic trees, the speed with which proteins diverge can be estimated from the branch lengths in the tree. As genes are duplicated, the resulting paralogues that constitute the protein family will diverge and possibly acquire distinct functionality over time. If we expect the rate of mutation to be the same for all members of the family (null hypothesis), a different rate of evolution/level of conservation would suggest some evolutionary pressure on a subset of family members that are functionally more important, in which case their inactivation would have a more profound effect on cell function. We constructed phylogenetic trees for several multi-protein families and compared branch lengths with known phenotypic defects that result from protein inactivation (taken from the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/)(si apre in una nuova finestra)). Our preliminary results on this limited number of protein families show that there indeed appears to be a correlation between conservation and functional importance. For instance, for the EYA (Eyes absent homologs) protein family, inactivation of the most conserved members EYA1 and EYA4 (short branch length) results in severe phenotypic defects and in many cases is not viable. The two other members EYA2 and EYA3 seem to be functionally less important as inactivation only results in mild phenotypic effects, and as can be seen from the phylogenetic tree, these proteins are not as conserved (longer branch lengths) (Figure 11). EYA1 and EYA4 are monoallelic during development whereas the phenotypically milder EYA3 is biallelic. Similar results were obtained for other families, notably the SIX protein family.
At time of writing the SyBoSS final report, Partner 4 has co-authored nine publications that cite SyBoSS funding support. Based on the citations of these papers, we can already say that the SyBoSS-funded work is having a significant impact in the research community. For example, the MIntAct paper has been cited 143 times and our review on the attributes of short linear motifs 156 times (source: Google Scholar).
Figure 11. Phylogenetic tree (vertebrate protein sequences) of the EYA protein family. Shorter and longer branch lengths indicate slower and faster evolution, respectively. The EYA1 (purple) and EYA4 (red) proteins have a slower rate of evolution/divergence (short branches) and their inactivation is associated with severe phenotypic defects. In contrast, the EYA2 (orange) and EYA3 (green) proteins evolve faster (long branches) and their disruption only results in mild phenotypic effects. Phenotypes were taken from the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org/(si apre in una nuova finestra)).
Pathway and state change models
We analysed the dynamics of transcriptional networks driving the differentiation of ES cells. This was achieved by integrating public and SyBoSS-specific ChIP-seq datasets (available at (http://syboss.cbs.dtu.dk(si apre in una nuova finestra); username: guest; password 1Guest!; database is not yet public) with SyBoSS RNA-seq data, which characterizes the transcriptional activity of >50% of all genes in the mouse genome.
1. TF centered dynamic analysis of the gene regulatory networks involved in ESC->EpiSC->NSC cell state transitions.
We first analysed mRNA-Seq data from SyBoSS experiments differentiating ESC to EpiSC and NSC (relative to ESC). We then integrated these gene expression profiles with TF-Promoter information obtained from public databases and from our re-analysis of Stem Cell specific publications. This generated a dynamic representation of the gene regulatory networks involved in differentiation. The RNA-seq data covered ESCs in 2i and serum, EpiSCs, and neural stem cells at multiple time points. For each time point, we identified the genes significantly changing relative to “ESC in 2i” and relative to the previous point. For each list, we divided the genes into up and down regulated and performed a functional enrichment analysis (GO analysis). TF-promoter association data were obtained and analysed as described above. The final TF-target network contains 557276 interactions covering 352 transcription factors of which 22 were the results of our own ChIP-Seq analysis. Target gene calling was performed using the Sikora-Wohlfeld et al. method (see above).
In order to integrate the transcriptomics and DNA-binding data in a way that would allow us to obtain a dynamic representation of the regulatory network involved in ESC to EpiSC to NSC differentiation, we used the Dynamic Regulatory Events Miner (DREM) tool, which has been created exactly for this purpose (Schultz et al, BMC Systems Biology 2012). DREM utilized both RNA-Seq data and TF-promoter information to infer gene expression thresholds and predict which TFs can explain such boundaries. In short, for each time point, DREM will separate genes into significantly changing genes and use those to look for enrichment of specific TFs in their promoter regions. Using gene expression levels of each TF, DREM will also separate regulators into “activators” (blue) and “repressors” (red). Unassignable TFs will be marked in black. Figure 12 shows the results of such analysis in which we only looked at 2 partitions, “Minimum Absolute Log Ratio Expression” of 1.2 Train test random seed of 1260. Additionally, DREM will perform GO Functional enrichment analysis on each branch.
Figure 12. DREM analysis of ESC-EpiSC-NSC RNA-seq data.
2. miRNA-centered dynamic analysis of the gene regulatory networks involved in ESC->EpiSC differentiation in E14 (male) and PGK (female) cells.
Similar to the previous mRNA/TF integration, we have studied the regulatory roles of miRNAs in the differentiation of ESCs to EpiSC in both male and female cells. In this project, we obtained RNA-Seq and small RNA-Seq data from differentiating ESC at days 0, 4, 10, 20 and 30 for both PGK (female) and E14 (male) cells. RNA-Seq data were analysed using the same pipelines and methods as for the previous project. miRNA expression levels were determined using the ncPRO server (https://ncpro.curie.fr/(si apre in una nuova finestra)) and differential gene expression was investigated using the same pipelines as for RNA-Seq.
miRNA and mRNA expression data were integrated with TF-Promoter (as previously described) and miRNA/mRNA interaction predictions retrieved and processed from TargetScan 6.2 (http://www.targetscan.org/mmu_61/(si apre in una nuova finestra)). The integration was performed using a miRNA specific variant of DREM, called mirDREM. This variant will use TF-Promoter, miRNA/mRNA predictions and overall expression pattern to “explain” overall gene expression. Importantly, mirDREM uses only those miRNAs showing anti-correlation with their target mRNAs at any time point to predict a regulatory role. These analyses were performed on both PGK and E14 cells (Figure 13). While TF regulation appears to be somehow consistent between sexes, miRNAs identities vary more. Figures 13 and 14 show PGK and E14 miRDREM outputs respectively. Only the top 5 most significant miRNAs and transcription factors are shown. Red – underexpressed miRNAs. Blue – overexpressed miRNAs.
Figure 13. miRDREM analysis of PGK (female) ESC-EpiSC-NSC data.
Figure 14. miRDREM analysis of E14 (male) ESC-EpiSC-NSC data.
Finally, we generated regulatory networks for both the “static” analysis (each time point as an independent observation using DESeq2) and the mirDREM “dynamic” model. Intersection of these results in an optimized network of the regulatory roles of miRNAs in ESC differentiation (Figure 15).
Figure 15. Integrated network. Green nodes - mIRNA; blue nodes – genes; blue/pink edges – male or female cells.
Models of protein complex and pathway-specific impact of X inactivation and X reactivation
X chromosome inactivation (XCI) and X chromosome reactivation (XCR) are essential processes in females already from embryonic development in order to compensate for the extra gene dosage and obtain monoallelic expression. Brunak/DTU has studied the effects of XCI and XCR using men with Klinefelter syndrome (KS) as model. KS is the most frequent chromosome disorder affecting approximately 1-2:1000 newborn boys with the most common karyotype being XXY (Hysolli et al, Cell Cycle 11, 229–235, 2012). Men with KS present a palette of co-morbidities, overrepresented disease occurrence compared to background population. Previous studies have been published focusing on KS co-morbidities (Bojesen et al, Acta Paediatr. 100, 807–813, 2011; Lahlou et al, Acta Paediatr. 100, 824–829, 2011), delayed speech and learning difficulties, psychosocial problems, testicular insufficiency, taller than predicted by the parent’s heights and eunuchoid body proportions. Yet, the penetrance of the phenotypes is highly variable with some not showing any signs of the syndrome while others are highly affected. The clinical phenotypes of KS are believed to arise from the genes escaping XCI.
The aim of the study by Brunak was to study the molecular mechanisms underlying the clinical phenotypes of KS. KS comorbidities were first extracted from 2.6 million Danish electronic patient records (Jensen et al Nat Commun 5, 4022, 2014). Some of the most significant co-morbidities of KS from this analysis were testicular dysfunction, hypopituitarism, neuromuscular scoliosis and osteoporosis. The observed co-morbidities of Klinefelter syndrome per ICD-10 chapter is illustrated in Figure 16. Gene expression data was generated on KS and control males and integrated into the network. The analysis of the gene expression data itself showed that the genes escaping XCI in males affect gene expression genome wide. This clearly showed that the extra X chromosome in men with KS is expressional active to some extent and triggers the KS phenotypes. To investigate how KS and its co-morbidities are linked at the protein interaction level, a phenome-interactome network was build consisting of sub-networks each representing a co-morbidity of KS. Numerous proteins in the network linked multiple KS co-morbidities either by being associated to numerous co-morbidities itself, designated multi-disease nodes, or linking multiple co-morbidities through first-order interaction partners, designated co-morbidity hubs. Thus, the network displays key players in the KS disease phenotypes.
Figure 16. Observed Klinefelter Syndrome (KS) co-morbidities displayed by ICD10 chapter. The KS code belongs to chapter XVII. The most frequently occurring co-morbidity is testicular dysfunction (chapter IV).
Disruptive phenotypic impact of interactome modularity. In order to investigate and rank the coordinated expression of components in protein complexes in the network we implemented a previously developed method for expression coordination quantification (Taylor et al Nat Biotechnol 27, 199–204 2009). Integrating the gene expression data and using this methodology our analysis reveals that the structure of the interactome is likely to drastically decrease the phenotypic effect of KS.
The gene expression data from whole blood was used to depict disrupted protein complexes in KS. Next step will be mapping cell–type specific transcriptome data onto the phenome-Iinteractome network. The method we have developed makes it possible to investigate whether the stoichiometry in protein complexes increase or decrease the phenotypic impact of the de facto copy number variation. The approach we have developed is entirely general and can be applied to any case where gene dosage changes occur, such as X-inactivation, cancer induced aneuploidies or inherited copy number changes observed in individuals.
Potential Impact:
Expected final results and potential impacts
Overall SyBoSS aimed to gather and integrate systematic datasets into a comprehensive network model that will enhance the understanding of the regulatory circuitry underpinning a mammalian cell type, in particular a stem cell and more particularly an embryonic stem cell and its transition to other multipotent, self-renewing stem cells. By the end of the project, we have succeeded in assembling an unrivalled resource of primary information regarding the regulatory composition of a single, physiologically relevant, mammalian cell. It will take more time to integrate the very substantial volumes of data now available for ESCs into sophisticated regulatory models. Indeed model building inevitably must be an ongoing process of refinement and development. In this context, the ESC data resource is a premier venue for creative and constructive advances in the strategies for regulatory modelling of complex networks. In addition to advancing knowledge about ESCs, the SyBoSS project delivered substantial progress in the understanding of stem cells per se. In part because of the comprehensive datasets available, ESCs are now a paradigm for stem cells. SyBoSS also delivered further insights into pluripotency, stem cell self-renewal as a reinforcing regulatory network, destabilization of self-renewal and exit from pluripotency, the transition to the neighbouring quasi-pluripotent EpiSC state that nearby state of self renewalelf-as well as created a mammalian network of predicted protein-protein interactions as well as reference data sets for proteomic, transcriptomic and epigenomic profiles. Using well established computational methods, the growing body of systematic data from wild type and gene specific sources is being integrated with available public information into models and network simulations to infer the molecular interactions relevant for the four stem cell stages and the transitions between them. As these simulations grow, the value of the transcriptome data becomes ever more apparent. The implementation of our two technical remedies and the resulting acquistion of the transcriptome data is having a synergistic effect on the value, quality and accuracy of the overall outcome of the SyBoSS project.
The final outcome of the SyBoSS project is the establishment of an information resource pertaining not only to stem cell biology but also to mammalian cell biology in general with particular focus on transcriptional and epigenetic circuitry. This outcome is still under construction but the vast majority of the requisite data has been collected.
In the course of the project we developed and extended the methodology of mammalian systems biology. This will directly benefit other applications as well as serve as a logical starting point for the comprehensive understanding of mammalian development as a transcriptional program.
Through enhanced understanding of stem cell biology, the potential for stem cell applications in regenerative therapies and medicine will be enhanced. The information that SyBoSS has gathered and is organising will be important for developing standards for cell based therapies and devising new approaches based on improved understanding of regulatory interactions. This information will also impact on improved disease modelling, drug screening and toxicity tests using ESC and iPSC cell culture and differentiation. These models reduce the need to use animals for testing and in certain applications are more precisely defined and permit greater accuracy.
The broadest socioeconomic impact of the project relates to the development of new ways to treat chronic disabilities in the human population, particularly those related to ageing, as well as genetically based disabilities or degenerative diseases. These health issues are beyond current therapies. The information delivered by the SyBoSS project will contribute to the understanding required to search for new solutions.
Humans are the most complex phenomena in the universe. It is not surprising therefore that we are far from understanding much of ourselves in health and disease. We are based on at least 30,000 genes encoding at least 70,000 proteins and likely 10,000 non-coding RNAs whose temporal and environmental flexibilities add several dimensions to the complexity of the organism. This lack of understanding also applies to our understanding of even one cell type. Despite considerable progress, of which SyBoSS has made a hefty contribution, it will take a great deal more work and creative analysis before we arrive at a comprehensive grasp of stem cellness and predictable ways to employ these properties for applications in human health. Nevertheless the recent reactionary policies to return to the old ‘trial and error’ empiricism of medical research, now termed ‘translational research’ is very discouraging. The motivation to apply progress in the understanding of biological systems to applications in medicine is certainly good. However to abandon primary research when so much knowledge remains to be gathered and to return to the traditional guesswork of medical research is a very ineffective way of making progress. Under the current research funding policy climate, not only will a great deal of funding be wasted but also the highly skilled and developed European research environment, much of which is the very successful and constructive product of the 6th and 7th Framework Integrated Project policy, will be lost. Outstanding European collaborations, previously fostered by enlightened EC funding, are now disintegrating. It is extremely disheartening to see European biomedical research flounder in this way. For these reasons, I have my doubts about whether the vital, essential and remarkable progress that has been made in the fundamental understanding of mammalian development and stem cells will lead to effective exploitation.
List of Websites:
syboss.eu
Prof A Francis Stewart
Scientific Coordinator
Systems Biology of Stem Cells and Reprogramming
Director, Biotechnology Center
Department of Genomics, Technische Universitaet Dresden
Tatzberg 47-51
01307 Dresden
Germany
tel: +49-351-46340130
fax: +49-351-46340143
stewart@biotec.tu-dresden.de
 
           
        