Skip to main content
European Commission logo print header

Cis-regulatory logic of the transcriptional control in neural stem cells

Final Report Summary - CISSTEM (Cis-regulatory logic of the transcriptional control in neural stem cells)

Executive Summary:
Neural stem cells (NSCs) have emerged of as a major topic in neurobiology. The persistence of multipotent cells in the adult mammalian brain indeed offers a realistic chance for the treatment of neurodegenerative diseases. It is thus crucial to understand NSCs from as many as possible angles in order to better isolate and successfully manipulate them.
CISSTEM presented a post-genomic systems biology approach, taking advantage of new computational and experimental tools to address the specification and maintenance of NSCs at the transcriptional/epigenetic level. CISSTEM was designed to unravel the basic principles of gene regulation in NSC, with a focus on cis-regulatory modules (CRMs). To do so we followed a multidisciplinary approach tightly interconnecting computational prediction and experimental validation in vitro and in vivo using different vertebrate models systems.
Major steps were the establishment of gene lists active in stem cells, the collection of DNAse HS/CHIP-seq marks dynamically functional in stem cells, and eventually the prediction of relevant elements, and the validation of the temporal, spatial and quantitative activities of the predicted conserved regulatory motifs associated with NSC expressed genes.
In this context, CISSTEM has
• Developed computational tools and resources for the in silico identification of CRMs and transcription factor binding sites (TFBSs). Developed a project browser to deposit these data.
• Identified genomic regions functionally active in the NSC through direct mapping of DNase I hypersensitivity sites. Putative functional CHIP-seq annotations have also been implemented and conducted direct CRM predictions.
• Identified over-represented motives/signatures in proximal and distal elements found around NSC expressed genes
• Validated the elements acting as transcriptional enhancers in different neural cell lines and in vertebrate animal model systems
• Disseminated the results of the project to the research community, biotech/pharmaceutical companies and the general public

Project Context and Objectives:
The persistence in the adult brain of multipotent cells, able to generate de novo neurons which can integrate into the brain circuitry, offers at present the best hope for the treatment of neurodegenerative diseases. These latter are becoming a major health issue in European countries, due to the aging of the population.
It is essential to reach a better understanding of NSCs at the cellular and molecular levels, in order to better diagnose and safely manipulate them. Currently, these cells are much less studied than the embryonic stem (ES) cells, the existence of which has been known for much longer, and which serve as reference point for the analysis of pluripotency in other stem cell types. More specifically, understanding the molecular basis of neural stemness, and the gene-regulatory networks that define this, are essential.
CISSTEM’s overall goal was to understand the nature of the nodes of this network, ie. the regulatory principles of the DNA elements that governs NSC-specific gene expression. Even if NSCs could be cultivated in vitro, in vivo their homeostasis is controlled through intricate signalling cross-talks between NSCs and their surrounding “niches”. For this purpose, it is important to characterize the function of NSC potential regulatory elements in vivo as well.
The overall objective of CISSTEM was to understand cis-regulatory logic enabling gene expression in pluripotent neural stem cells. Major intermediate objectives of this project were the prediction of relevant elements and the identification of the temporal, spatial and quantitative activities of predicted conserved regulatory motifs associated with pluripotency genes.
To achieve this goal, CISSTEM was built in 6 scientific workpackages:
1. WP1 aimed at establishing comprehensive lists of genes active in NSCs, and which are therefore taken as candidates for the maintenance of their pluripotency. This has been done using several approaches, ranging from results furnished by biological studies to genome-wide datamining. It was important to combine genomic and biological approaches, because currently, the degree of similarity between the transcriptome of pluripotent cells maintained in vitro and the NSC transcriptome in their in vivo niches was not precisely known. Furthermore, comparing lists of genes obtained by different approaches in different organisms helped evaluating the “stemness specificity” of genes active in NSCs, but also in most cases in several other cell types. In parallel, we have also established a small list of genes involved in a situation better known and defined that the “stemness” state, namely the proliferation arrest, or switch between proliferation and terminal differentiation. Indeed, in all our models (fish retina, fish tectum, mouse forebrain) we easily defined a set of genes either very specifically expressed in the restricted zone where cells exit the cycle and engage in the differentiation pathways, or known, from functional studies, to be important regulators of this arrest/switch (for example Ath5, well documented in the context of retina development). This additional small list of genes allowed us to refine the search algorithms that are essential for the project, and also to evaluate the differences between the regulatory logics underlying these two sequential biological processes.
2. Decrypting the action of evolution using comparative sequence analysis is one of the most powerful ways to find, categorize and understand functional genome sequences. To achieve this one needs to generate accurate genomic alignments coupled with good evolutionary models to predict the history of each base in extant genomes. This problem has been approached with a three stage pipeline: identification of orthologous regions in each genome, initial alignment of co-linear genomic regions, and then a refined alignment using detailed models of the sequence evolution, often with the explicit goal of predicting ancestral sequences.
In the WP2 we tried to solve the problem of genome alignments in ray-finned fish, which have undergone whole-genome duplication (WGD) after its separation from tetrapods (Jaillon et al., 2004). We have extended on our recent work that allows us to provide accurate genome-wide homology assignments, handling duplications at any stage of the evolutionary tree and with a sophisticated indel aware model of evolution. First we have extended and applied to fish genomes a new graph based method called ENREDO, which is able to find homologous co-linear regions without any restriction in the duplication or deletion structure in each lineage. ENREDO relies on a set of lineage-specific sequences, the anchors that can be mapped on the genomes. When applied to the 4 eutherian mammals with long range assemblies (Human, Mouse, Rat, Dog), 95% of the human transcripts are assigned to homologous regions in the other species. We have also adapted a new genome alignment method to fish genomes called PECAN which uses posterior probability based consistency alignment to provide highly accurate alignments. PECAN employs a new concept of sequence progressive alignments coupled with a new method for restricting the space of informative alignments from outgroups, called transitive anchoring. Resulting alignments have higher sensitivity (12% at equivalent human-mouse distance) and equal or higher specificity (1.1% human-mouse) than the next best method, as assessed by independently generated simulations. Finally we have developed a new evolutionary alignment modeler, called ORTHEUS, for predicting the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions.
3. Searching for candidate CRM regions in large vertebrate genome is dramatically complicated by the presence a large number of potential transcriptions factor binding site (TFBS) motifs. These motifs are short sequences, with little information content, and appear many times in the genome in regions that are apparently not functional. As a result, bioinformatics approaches that attempted to discover CRMs and TFBSs in vertebrates’ genomes by simple computational approaches alone have had only limited success. The first objective of WP3 was to identify regions of open chromatin that are most likely to be functionally active in the NSC through direct mapping of DNase I hypersensitivity (DHS) sites. This enabled to dramatically limit the portion of the genome in which to search for functional CRMs. DNA at DHS sites is exposed and accessible for DNA-proteins interactions including the binding of cis-acting transcription factors. We have then applied a general comparative method of CRM discovery based on a novel, hierarchical model of CRM structure. DHS sites and predicted CRMs were assigned putative functions by integrating this data with information on chromatin modifications, gene expression and genome organization, extending approaches reported in (Mikkelsen et al., 2007).
4. Genes expressed in NSC may not be involved in pluripotency maintenance but also in basic cell maintenance. The objective of the WP4 consisted of predicting the DNA regulatory motifs that specifically trigger expression in NSCs. Factors responsible for the NSC maintenance and proliferation should bind significantly more often in these regions and therefore their binding site should be significantly found over-represented. Since these factors are not characterized, a de-novo motif discovery strategy was applied in order to identify a dictionary of motifs that are differentially distributed in the candidate regions. The strategy to achieve this objective consisted of using the CRMs identified in WP2 that overlap with the DNaseI hypersensitivity sites (WP3) in proximity of genes that were significantly down-regulated during differentiation of cultured NSCs, or expressed in an overlapping fashion to NSC regions in vivo (WP1, tasks 1.1 and 1.2). Subsequently, a specific set of CRMs responsible for the control of expression in NSC was identified in mammal and fish based on the composition of these motifs within the boundaries of the CRMs.
5. The overall objective of WP5 was to identify, within the large data set provided by WP4, which elements were acting as transcriptional enhancers, as well as to define their pattern of activities in different neural cell lines. Because the degree of difference (or similarity) between the transcriptional regulative elements active in NSCs in vitro and in vivo was not fully documented, it was essential to combine (large-scale) functional tests in cell cultures or organotypic slice cultures (performed in WP5) with (necessarily smaller scale) assays in transgenic animals (performed in WP6).
The experimental data (i.e. expression profile driven by the elements in cell lines or slices - WP5 -, fish and mice - WP6 -) has been integrated in a common database, then will be used to refine computational predictions by WP3 participants, and ultimately to build models of the cis-regulation of gene co-expressed in NSCs.
6. A substantial part of WP6 was devoted to improve the state-of-art of the transgenic assays used to characterize the activity of potential cis-regulatory elements. The overall objective of WP6 was to identify, within the large data set provided by WP4, which elements were acting as transcriptional enhancers, as well as to define their pattern of activities in vertebrate animal model systems. We have performed intra-species tests only, but it was extremely interesting to compare in silico the grammars of the elements found in teleosts and mammals to point to conserved mechanisms at long evolutionary range. Indeed we felt that, even if in cross-species in vivo tests (mice enhancers in fish, etc…) could be informative, they would require an additional amount of work that cannot be proposed within the framework of CISSTEM. Similarly, a systematic KO of these elements, although being obviously the best strategy to functionally validate elements, would have been much too costly to be founded at a large scale to perform an initial screen.
Within this framework, we have developed novel approaches to improve the efficiency and accuracy of the transgenic assays.
7. CISSTEM dissemination ensured that the resulting scientific developments of the project were effectively disseminated through publications, data release and the Ensembl genome browser. From the research and technical inputs and results, CISSTEM has performed a discerning analysis of the information and focus dissemination on the following target groups: scientific community, media, public ethical forums and committees, associations of patients with neurodegenerative diseases, biotech companies that exploit patents in stem cell research field.

Project Results:
WP1: Establishment of lists of genes active in neural stem cells
The project has started with lists of candidate genes expressed in neural stem cells, progenitors and cells exiting the cell cycle. INRA has generated a list of fish genes expressed in tectum zones where slow cycling progenitors (stem cells) were found. INRA thus identified 50 markers.
FZK has contributed a similar list resulting from in situ screens focused on eye development. Strikingly INRA characterized a strong synexpression group between retina and tectum allowing pooling both lists of genes from FZK and INRA.
MRC-NIMR has generated a list of mice genes associated with pluripotency in NS5 cell cultures.

WP2: Comparative genomic tools:
Progress in this workpackage has been extremely satisfying. A password protected genome browser interface for the project is available at http://hgw-max.smith.man.ac.uk. The customized version of the UCSC genome browser for the CISSTEM project is now installed and running with reference genomes for mouse and Medaka (plus a few other species). In addition to baseline tracks provided by UCSC, UNIMAN has curated and added ~70 sets of functional genomics data from papers reporting chip-seq and chip-chip experiments in mouse stem cells and the 5way GERP conserved regions data from Ensembl in Medaka.
A comprehensive mapping between the homologous fish and mouse genes was performed using the methods for gene tree construction developed throughout the project period. High-quality alignments on teleost genomes have also been built.
The results of the work package are well integrated in the sense that the identification of the homologous genes and the constrained elements are both displayable though the project genome browser and, in fact, the constrained elements are currently displayed. The identification of the homologous genes in addition to the constrained elements have been released to the project and to the wider community though the main Ensembl platform. In both ways these results have met the needs of the consortium and the requirements of the deliverables.

WP3: Identifying candidate regulatory regions in fish and mouse genomes
In this workpackage, we have identified DNAse I hypersensitive regions from self-renewing neural stem cells induced to differentiate into neurons, in order to identify regions that have an open chromatin in self renewing stem cells that closes as stem cells differentiate. Additionally, we have identified regions bound by a set of eight neural stem cell transcription factors (Sox2, Mash1, Pou3f1, Pou3f2, Pou3f3, Olig2, Sox21, Sox9) by Chip-seq to provide higher resolution insight into active neural stem cell regulatory regions. These studies on transcription factor binding were complemented by global analysis of gene expression in cell lines that had mutations in these proteins. We developed a computational pipeline to identify DNase hypersensitive regions from mouse embryonic stem cells, which is currently being used to identify significantly open chromatin regions on the basis of DNase I hypersensitivity in neural stem cells. We have also developed a comparative genomics based method called Contemplate for the general prediction of regulatory regions in animal genomes that we have made publicly available for download under an open source software license (http://sourceforge.net/projects/contemplate-hmm/). We have applied this system to the complete mouse genome, using three different species as comparators (human, zebrafish and medaka) and found over ~300,000 high-confidence putative regulatory regions to be conserved between mouse/human identified but only ~700 high confidence putative regulatory regions to be conserved between mouse/fish. Functional and comparative genomic data have been curated and loaded into a custom genome browser (http://cisstem.smith.man.ac.uk/) to enable integrative analysis and make the results of the project available to the wider scientific community. Together these resources have been used to address the main aim of identifying regulatory regions that are active in neural stem cells.

WP4: Computational identification of cis-regulatory motifs in neural stem cells
UHEI integrated the ChIP-seq datasets of various transcription factors and co-factors in NS5 cells provided by MRC-NIMR and various publicly available datasets notably for the histone methylation status.
From these datasets, four well-defined classes of CRMs in the genome of mouse neural stem cells (NS5) were found : i) Active enhancers : H3K4me1 positive histone marks (+) ,H3K27ac positive histone marks (+) and H3K27me3 negative histone marks (-) at more than 2kb from transcription start sites (TSS) ii) Poised enhancers: H3K4me1+,H3K27ac-.H3K27me3->2kb from TSS iii) Repressed enhancers: H3K4me1+,H3K27ac-.H3K27me3+>2kb from TSS iv) Active promoters: H3K4me3+,H3K27ac+,H3K27me3-,<2kb from TSS v) Repressed promoters: H3K27me3+,H3K27ac-,<2kb from TSS.
Motif discovery was done for each class of regulatory elements. UHEI and MRC-NIMR found that bHLH, SOX and NFI factor motif predominate in active enhancers. They also found differential motif enrichment in proliferative versus quiescent cells with more Ebox motifs enriched in enhancers specific to proliferative stages and NFI motifs enriched in quiescence-specific enhancers.
To transpose finding from mouse to Medaka, the fish orthologous sequences of the mouse regulatory elements were identified. UHEI found 883 orthologous regions in Medaka corresponding to mouse CRMs defined in 4.2.
In parallel EMBL has developed a “regulatory sensor” approach and compare the result of their sensors to the result obtained by UHEI and MRC-NIMR in NS5 cells. EMBL found several instances where an enhancer had a wider domain of activity when tested autonomously compared to the observed output in its endogenous context. This difference hinted to the presence of negative regulators of gene activity (silencers, repressors) in their vicinity that we are – for some of them – aiming to delineate further.
UHEI and EMBL also investigated forebrain-specific p300-bound regions and identified a motif that was significantly over-represented in these sequences. This motif ACAAAGg showed striking homology with the Sox2 consensus defined from ChIP data. When tested in-vivo in fish, the activity was essentially located in the brain with a large fraction of these enhancers with forebrain expression. Deletion of this motif however showed little effect on the activity of five different enhancers tested in transgenic medaka embryos.

WP5. In vitro functional characterization of candidate enhancers
Putative enhancer sequences tested derived from the ChipSeq data generated at MRC-NIMR (mammalian) and from expression screening performed at FZK and INRA.
Testing activity in neural stem cells in vitro: UNIMIB has tested the following 39 potential candidate elements derived from lists of Chip-Seq data: sox21, olig1 (35), olig1 (36), olig1 (-6), olig2, sall3, tle4, pou2f1, puo3f1, klf13, mash1, sox8 (-5), sox8 (9), lnfg (2), lnfg (4), Coro2b, Ddx20, Epn2, Grlb1, Mtss1, Mash1(+45), Irx3, Osbpl9, Lrp11 and Cerk, Fb 1, Fb 7, Fb 8, Fb 9, Fb 10, Fb 34, Rev3l , Wnt7a, Sema6a, Smarcc1, Phlpp, Pknox1, Plce1, Plxna2.
Reporter vectors were transiently transfected into neural stem cells (NSC) obtained from adult mouse subventricular zone (SVZ) and NSC-differentiated progeny, including neurons, astrocytes and oligodendrocytes. The activities of the enhancer sequences were respectively evaluated by the comparison with luciferase activity in cells transfected with mock vector and results reported as folds of induction.
Results and conclusions:
? The activity of putative enhancer sequences for Mash, Sox21, Olig2 was higher in undifferentiated than in differentiated NSC, consistent with the overexpression of the closest genes in self-renewing NSC; in parallel, a drastic reduction of enhancer activity was observed upon differentiation.
? Olig(-7), Sall3 and Sox8 (9)–associated regulatory elements were active either in undifferentiated, progenitors or differentiated cells, while Olig1 (35 and 36), Tle4, Sox 8 (-5), Lfng (2 and 4) and Fb8 were never active. We can hypothesize that these sequences are not actually regulatory sequences of the target genes or, alternatively, they could exert their regulatory activity in cooperation with other sequences.
? Coro2b, Ddx20, Epn2, Grlb1, Mtss1, Mash1 (+45), Fb7, Fb9, Fb10, Rev3l , Wnt7a, Sema6a, Smarcc1-associated regulatory elements were constantly active during differentiation, with Wnt7a and Smarcc1 showing a predominant activation in NSC.
? Sox21, Olig2, Pou2F1, Pou3F1, Klf3, Irx3, Osbpl9, Lrp11, Cerk, Fb1, Fb34, Phlpp, Pknox1, Plce1, Plxna2 and Mash1–associated elements display a modulated activity over differentiation, with Olig2, Pou2f1, Fb1 and Osbp9 activities reaching a peak at the progenitor stage. This result suggested that some regulatory sequences could be involved in the cell fate commitment of NSC.
? Phlpp and Phnox1 regulatory sequences showed a moderate activity only in differentiated cells.
Thus, in vitro in NSC and differentiated NSC, the 39 putative regulatory sequences tested behaved differentially. Interestingly, the activity of non-coding sequences close to Mash, Sox21, Olig2 was higher in undifferentiated than in differentiated NSC, consistent with the strong expression of their putative target genes in self-renewing NSC; in parallel, a drastic reduction of enhancer activity was observed with differentiation. This shows that prediction of enhancers deduced from Chip-seq can provide good candidates with actual enhancer activity, in a given cellular context, here the stemcellness. The correlation between motifs for transcription factors binding sites in the tested sequences and the actual activity of the sequence in NSC will be an interesting perspective to further understand a “cis-regulatory logics” for stem cells.

Testing activity in embryonic mouse cortex ex vivo and in utero/in vivo: CNRS has selected potential candidate elements from a list of 1750 elements derived from lists of Chip-Seq data representative of the combinatorial binding of Brn, Oct, Sox, and Mash transcription factors.
These lists have been screened on a first criterion: GO term (Gene Ontology) of the closest neighboring gene, including cell proliferation or cell differentiation and development of the nervous system, stem cell, or cell cycle. This resulted in a restricted list of 124 candidate elements. Then a second criterion was applied: expression of the neighboring gene in the developing cerebral cortex (brain region to be electroporated as functional test) of the mouse between embryonic stages E10 and E15, as reported in one of these 3 expression atlases: the Allen brain altlas, Gensat, or Genepaint. This resulted in a final list of 22 candidate mammalian elements to be tested in the embryonic cortex: Apbaenh, Btg1enh, Cdh4enh, Cdh5enh, E2f1enh, Enc1enh, Fubp3enh, Fzd9enh, Gli3enh, Gsh1enh, Lfng2kbenh, Lfng4kbenh, Nestinenh, Nr2f1enh, Pou2f1enh, Pou3f1enh, Ptprz1enh, Rikenh, Sall3enh, Sox2enh, Sox11enh and Tle3enh.
CNRS has tested these 22 sequences ex-vivo, on E13.5 embryos obtained by caesarian of the pregnant mother. Reporter plasmid DNA was injected into the lateral ventricle and 2x35msec pulses of 35mV with a 1msec interval were delivered with two electrode paddles positioned on either side of the head for electroporation. Immediately after ex vivo electroporation, the brains of embryos were dissected out in ice-cold 1X Krebs solution and sectioned at 250µm using a vibratome. Brain slices were transferred onto a slice culture insert with culture medium. Reporter activity was assessed from GFP expression in the embryonic cortex 24h after electroporation. Alternatively, for a short list of 3 enhancers, were also tested in vivo after in utero electroporation in the E13.5 mouse cortex (CNRS, additional work, not planned in WP5) to get better cellular and anatomical resolution.
Results and conclusions:
? Ex vivo in cortical slices, 5 out of 22 tested sequences showed enhancer activity. They are: enh16(Cdh4), enh17(Sall3), enh18(Gsh1), enh19(ptprz1), enh20(Pou2f1) (with enhancer number(name of closest neighbouring gene)).
? Bioinformatics analyses and comparisons of active versus inactive sequences showed that the active sequences had a higher frequency of Pou transcription factor binding sites than inactive sequences.
We further characterized the types of cortical progenitors in which enh16(Cdh4), enh17(Sall3) and enh20(Pou2f1) were active by double labelling with phospho-histone H3 (general mitotic marker) and Tbr2 (specific marker for transit-amplifying basal progenitors of the cortex). The 3 characterised enhancers showed distinct types of activities. This included:
? activity of enh16(Cdh4): in all embryonic cortical layers, including in proliferating radial-glia-like apical progenitors and proliferating Tbr2+ basal progenitors.
? activity of enh17(Sall3): exclusively outside the ventricular zone where radial-glia-like apical progenitors reside; never in mitotic cells; in non-proliferating Tbr2+ basal progenitors.
? activity of enh20(Pou2f1): only in germinal zones (ventricular and subventricular zones); in proliferating radial-glia-like apical progenitors; in non-proliferating Tbr2+ basal progenitors.
In sum, we have started uncovering a cis-regulatory logics in embryonic cortical progenitors of the mouse.
The analysis of TFBS composition of active sequences points to an important contribution of POU-type TFBS to confer activity to the sequences in the cortical germinal zone. Moreover, the precise analysis performed in a second time after in utero electroporation and double-labeling for proliferation- or progenitor-specific markers points to a very high specificity of each enhancer sequence to drive target gene expression in cells with a given proliferative state or a given progenitor type. Some of the enhancer characterized here will probably be used as “drivers” for transgenesis in specific progenitor types in the future.


WP6. In vivo functional characterization of candidate enhancers
We have optimized protocols to improve the efficiency of transgenesis in mice and fishes and develop new lines that allow to specifically target transgenes of interest to predefined loci – in mouse ES cells or directly in medaka embryos – by PhiC31-mediated integration.
In particular, we developed four Medaka transgenic lines containing each a single copy of a docking site suitable for PhiC31-mediated site directed transgenesis. In each line, the docking site was associated with a fluorescent marker gene expressed in the heart. Upon PhiC31-mediated integration of the incoming transgene, the marker switches from green to red, enabling a simple, non-invasive identification of the transgenic animals, which have integrated the transgene at the predefined site. With this system, enhancer test constructs can be properly compared qualitatively and quantitatively since they will all be influenced in the same way by the surrounding genome; this should minimize variation in transgene expression patterns due to variable integration sites and thus help to reduce the number of animals that are needed to fully characterize an enhancer element.
We have developed a similar system for mouse, using site-directed exchange in mouse ES cells. The seven ES cell lines with new docking sites we produced could offer useful alternative to the ones currently used by the community (Hprt, Rosa26) for site-directed transgenesis.
To establish a rapid and cost/animal number efficient screening system for transgenic mouse, we have optimised and streamlined a protocol using the high-efficiency of lentiviral mediated transgenesis. The use of ultrafiltration as virus- purification and concentration system allowed us to produce highly concentrated viral stocks with lower culture volumes (improving cost/efficiency and biosafety). With this approach, a ~ 50% transgenic yield (nb transgenic/nb injected embryos) was routinely achieved, compared to the <5% obtained with classical pronuclear techniques.
In parallel to these development of transgenic approaches, we have characterized in vivo, in mice and in medaka fishes, the regulatory activities mediated by a large number of DNA elements identified as potential NSC-enhancers by our partners in CISSTEM. These assays have revealed an important number of important insights:
? in medaka, we identified several DNA modules that drive gene expression specifically in the retinal ciliary marginal zone, either in neural stem cells and/or committed progenitors cells that contribute to add new retinal neuronal cell types as the eye grows. These elements are associated with transcription factors known to play an important role in neuron development, including Rx2, Sox2, Tlxs and they could represent key nodes in the interconnected NSC gene regulatory network. Besides providing a basis to dissect the cis-regulatory code associated with NSC-specific expression, these elements could also be used as tools to drive genes-of-interest to NSC or progenitors to perform in vivo functional analyses of NSC genes.
? in mice, we generated a large dataset to assess in vivo the activity of elements bound by NSC-associated TFs in neural stem cell in culture. This provided a framework to refine prediction of the activities mediated by elements defined initially by TF occupancy and associated chromatin marks. This list of validated enhancer will help identifying specific motifs and/or cis-grammar associated neural progenitors and stem cell expression. Interestingly, we found that many of the elements bound in NSC by NSC-specific factors are also active enhancers in the developing nervous system. In particular, we identified several elements around known NSC TFs or signaling molecules that are expressed in developing neurons in mid-gestation embryos. This suggests that gene expression in NSC elements may involve the action of developmentally active enhancers, suggesting that part of the GRN may be shared.
A large focus of CISSTEM was directed towards understanding the contribution of individual cis-regulatory elements to gene regulation. However, gene expression usually results from the integrated action of multiple elements. With the help of CISSTEM, we developed an in vivo transposon-mediated screen of the mouse genome regulatory potential. The comparison of the autonomous activities of enhancers to the overall output associated with their region revealed the pervasive influence of repressive elements that restrict enhancer activities and refine gene expression patterns. The transposon insertions obtained in the vicinity of NSC genes could be also used further.

WP7. Dissemination and technology transfer
The main aim of the WP7 was to provide a basis for the dissemination of results of CISSTEM, together with the definition of a communication strategy and the accomplishment of the scientific milestones and general scope of the project.
A series of internal meeting allowed the Consortium partners to discuss the results and express their ideas about the communication plan. The mid-term CISSTEM meeting gave us the opportunity to further reaffirm the consortium scopes and strategies and to display each partner’s current dissemination activities.
As forecast in the Description of Work, CISSTEM was cited by the Consortium partners during conferences and other communications to the international scientific community.
Furthermore, thanks to the involvement of the WP leader into several extra-academicals activities the word was spread also among ethical committees, associations of patients and particularly to those associations and no-profit entities dealing with neurodegenerative diseases fund raising and healthcare.
The creation of one thousand leaflets (as of WP 7.3) helped with this activity: designed to reach the widest range of public, including the above mentioned associations and biotech companies, they were the result of a combined work of UNIMIB and INRA partners, with UNIMIB writing, editing, printing and distributing the material and INRA designing and contributing to the leaflet contents.
They were distributed among CISSTEM partners and used during congresses and international meetings.
The completion of the task was delayed to M13 in order to present and discuss the contents during the Annual CISSTEM Meeting. We considered the production of further reprints that were distributed to new communication targets which were identified after the collection of the latest project results and data during the mid-term CISSTEM meeting in April 2010. For this reason the individuation and dissemination to specific biotech companies was shifted to M18-36.
Annual newsletters were created thanks to the contribution of each partner, spread among the Consortium and published in October 2009, December 2012 and June 2011 on CISSTEM website. UNIMIB and INRA took care of the collection and editing of partners’ contributions.

Potential Impact:
The primary impact of this workpackage 3 is the basic knowledge and scientific resources developed as part of obtaining the results. Basic knowledge includes the location of hundreds of thousands of putative CRMs on the basis of functional and comparative genomic data, as well as further insight into the best practices for how to collect and interpret these data types. Resources include ~70 curated datasets of Chip-seq data from the literature, DNAse-seq data from three conditions, raw and processed data from eight neural stem cell transcription factors and 17 microarray studies, open source code and executable for the Contemplate software system, text2genome based mapped of the literature to the mouse genome to enable rapid interpretation of functional and comparative genomics results, and release of public data through the CISSTEM project browser. These data will enable further research activity and potentially stimulate new discoveries that could lead to progress in basic or applied understanding of gene regulation or neurogenesis. Dissemination of the results in this workpackage has been mainly through conference presentations and the public CISSTEM browser. One paper arising from this workpackage has been published (Hauessler, M., M. Gerner & C.M. Bergman (2011) Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics 27:980-6.), describing the text2genome system, which attracted sufficient interest to be written about in a Nature News article (van Voorden (2012) Nature 483: 134–135), leading to a wider audience for this work and the CISSTEM project. The text2genome software is being exploited by Elsevier, N.V. into an "App" (http://www.applications.sciverse.com/action/appDetail/298535) and thus will contribute to the improved access to the literature for this commercial publisher. We anticipate at least two other papers arising from this workpackage focusing on the Contemplate regulatory region prediction system and the integrated functional genomic analysis of regulatory regions in neural stem cells.
The 22 mammalian candidate enhancers that have been tested in mouse embryonic cortex slices have been incorporated and annotated (active/inactive) in the CISSTEM browser.
The new transgenic approaches developed and optimized as part of WP6 will be shared with the scientific community. They represent major improvements of the available technologies, in terms of quality of the data produced. Importantly, their efficiency is paralleled with a reduction in the required number of experimental animals, and therefore represents a substantial contribution in the direction of the 3R principles. The protocols describing them will be published by FZK and EMBL in scientific peer-reviewed journals. Plasmid constructs will be deposited to Addgene (www.addgene.org) to allow easy access to them. Medaka lines and mouse ES cells will be available on demand to the respective groups, under standard MTA agreements. Thus, we anticipate that these new approaches will be broadly used.
The principle of the transposon-mediated identification of gene regulatory landscapes has been published in 2011 in Nature Genetics, including a preliminary analysis of their features and properties. A dedicated website is being developed (a beta version is already in place: http://www.ebi.ac.uk/panda-srv/tracer/index.php). The site gives access to expression information regarding hundreds of insertions (April 2012: ~ 1500 insertions; ~ 500 analysed for embryonic expression) in the mouse genome. It could be used to identify regions associated with specific expression domains, and to compare these activities to other ones. A large part of the datasets consists of insertions in intergenic regions and therefore complements current gene-based mouse efforts (eg. EUCOMM). Currently, about 200-250 insertions are maintained alive or cryopreserved, so as to provide access to the mice to researchers that would be interested to use and analyse them further. So far, about 20 lines have been requested and sent to various groups in Europe and Japan.
Since most of the scientific data have been generated in the second half of CISSTEM, the scientific publications describing them are still to come. The datasets of the different activities of the enhancer identified by CISSTEM-WP6 will be released progressively in coordination and with the agreement of the partners that contributed to their generation. In particular, they will be displayed on the CISSTEM genome browser (http://genome.smith.man.ac.uk/) so as to provide an integrated, centralized repository.
The outcome of WP6 could lead to progress in several scientific directions. Getting insights into the structure of NSC-GRN and into their cis-regulatory logic was the main effort of CISSTEM and the motivation to perform such a large targeted screen of enhancer elements. In addition, individual elements could be very informative to identify new NSC genes or markers, and could be used to develop tools and transgenic lines with specific expression in NSCs, so as to facilitate their study in the future. These different findings have been discussed in different meetings and some are currently under development in the different groups (FSK, INRA).

An impact on the public is awaited from different actions. Annual newsletters were published on the website to communicate the goals of the project progressively reached by the advancing research of the different work packages at three different levels:
1-inside the Cisstem network: to communicate the partners overall activity.
2-outside the Cisstem consortium: to inform professional and the scientific community about the project aims and progress
3-press note: to provide media professional with basic information about the research lines.
4- A Cisstem leaflet was designed with short sentences and colored representative figures in order to communicate the importance of the project to the Scientific and non-scientific Communities. Cisstem leaflets were distributed at different international meetings, such as Neuroscience 2010 in San Diego, Neuroscience 2011 in Washinton DC and at the Asilomar Cancer Conference in Tubingen 2010, to mention a few.

List of Websites:
The address of the site is the following: www.cisstem.eu

Project co-ordinator name, title and organisation:
Name: Dr. Jean-Stéphane Joly
Organisation: Institut National de la Recherche Agronomique (INRA)
Tel: (+33) 1 69 82 34 31
Fax: (+33) 1 69 82 34 47
E-mail:joly@inaf.cnrs-gif.fr