CORDIS - Résultats de la recherche de l’UE
CORDIS

Defining the molecular basis of type 2 diabetes predisposition through targeted sequencing of the CREBBP-interacting gene network

Final Report Summary - T2DCREBBP (Defining the molecular basis of type 2 diabetes predisposition through targeted sequencing of the CREBBP-interacting gene network)

Type 2 diabetes (T2D) is the predominant form of diabetes and it has been recognized as a global health problem. The use of genome-wide association studies (GWAS) has generated a catalogue of over 100 loci associated to T2D that, cumulatively, explain around 10% of the variation in T2D predisposition. Most of the susceptibility variants discovered so far lie in intronic or intergenic DNA segments and are assumed to influence transcript regulation rather than gene function. Therefore, if we are to maximize the mechanistic insights regarding diabetes pathogenesis from these genetic discoveries, it will be important to determine, as the number of susceptibility-loci increases, whether the mechanisms implicated by human genetics coalesce around a limited set of core pathways and networks. The main aim of this project was to determine whether, in line with some earlier observations from my host group, CREBBP-related pathways could constitute one of these central mechanisms. As a complement to this main objective, I have developed new methods to refine strategies for the identification and prioritization of novel T2D susceptibility loci based on network analysis.
The project was divided into 4 specific objectives. The first objective involved the establishment of the CREBBP interactome (that is the set of proteins that interacts with CREBBP). I have used only experimental validated data from transcription factor databases and I have extended their connections using only the first neighbors of direct targets of these proteins. The identified the interactome of CREBBP as a subgraph of 369 nodes (proteins) and 4998 edges (interactions). This subnetwork is composed of only high confidence protein-protein interactions defined by combination of different databases: InWeb, String, PINA, HIPPIE, and iRefR. Only interactions based on experimental evidence were considered. Furthermore, we assigned a probabilistic score to each interaction, which makes it easier to filter the CREBBP interactome. One of the limitations of our curated CREBBP interactome is that it is only based on interactions described in developed tissues, thus, our PPi will not capture the transcription factors only activated in developmental stages.

In the second objective, I looked for further evidence implicating the CREBBP interactome in T2D pathogenesis by defining the overlap between the association signals generated in large-scale human genetics data, and variation in the genes encoding members of the network. I extended this objective with the addition of human genetic data sets not available at the time of the original proposal. These include (a) exome sequencing data from a total of 12,940 exomes (6,504 T2D cases; 6,436 controls) from the GoT2D and TD2GENES consortia; (b) exome array data on ~29,000 T2D cases and ~51,000 controls from DIAGRAM/T2DGENES; and (c) GWAS and custom array genotype data for 34,840 cases and 114,981 controls from DIAGRAM.

To implement these analyses, I have developed and implemented a network analysis pipeline to evaluate the relationship between the evidence of T2D association in these diverse sets, and the protein-protein interaction networks connected to the genes and regions implicated by the association studies. These analyses resolutely failed to replicate the original evidence of enrichment between CREBBP-related pathways and T2D. However, these analyses did reveal a number of protein-complexes for which there was evidence of enrichment, particularly from the exome variant datasets. These networks feature proteins related to mitochondrial processes (e.g. respiratory chain), and the regulation of insulin-secretion (Supplementary Figure 1). Furthermore, one of the protein complexes features a cluster of ankyrin repeat and SOCS box-containing (ASB) proteins highly enriched for the exome results, with ASB6 as the core protein in the network. This protein has been related to insulin receptor activation in adipose tissue, thus providing an interesting link between adipose tissue and T2D. These findings form part of a manuscript currently under review at Nature.
In the third objective of the original application, I had planned to carry out a targeted resequencing of the CREBBP interactome target genes for novel rare variants discovery. Nevertheless, this plan was altered, in part because of the results described in objective 2, but also because of the pace of accumulation of exome sequence data (which obviated the need for targeted resequencing of gene subsets). For example, the T2DGENES consortium, has, in addition to the 12,900 exomes described earlier, already completed a analysis of a further 17,000 exomes (from the ESP, SIGMA, LUCAMP, TODAY and SEARCH studies), and a further ~25,000 exomes will be sequenced by late 2015, for a total of ~55,000 exomes. This data will represent one of the biggest exome datasets available and joint analysis will allow further testing of the overlap between coding sequence variation and protein-protein interaction enrichment (globally, and not just for CREBBP). This will address the fourth objective of my original proposal since it will allow me to further evaluate the evidence that aggregation of rare variant association signals across PPI networks highlights key pathogenetic processes. I am also expanding these analyses to other types of network structure such as those defined by co-expression (eg using data from GTEX) and annotated gene sets.
In addition to the objectives proposed originally, I have developed a more comprehensive PPi-analysis framework, involving new methods capable of “predicting” novel networks associated with T2D. The methodology is applicable to other complex diseases. This effort includes the creation of tissue-specific PPI networks, the development of weighted (based on confidence of protein interactions and on pvalues from genomic data) and un-weighted (based on network properties and pvalues) schemes for gene prioritization, and the creation of co-function networks where gene co-expression data from publicly available repositories (e.g. GTEx consortium) is added to the PPi networks. I am testing these methods in a collaborative project to identify the most plausible candidate effector transcripts at known GWAS loci for T2D. The list of candidates is generated after integrating information from diverse sources (e.g. OMIM and MGI databases, GWAS credible sets, islet expression QTLs). Then, application of these PPi methods will highlight genes with the greatest functionally importance and to capture signals (genes/proteins) not previously implicated in T2D pathogenesis. So far, this PPi pipeline has uncovered a subnetwork of 465 proteins. The proteins not previously related to T2D were significantly enriched for T2D association in HapMap GWAS data from DIAGRAM consortium (p=0.001). Within this network, 3 central proteins, all transcription factors, (YHWAZ, TRAF6, CALM3) represent compelling candidates for further follow-up in functional studies (Supplementary Figure 2 and Supplementary Table 1). YHWAZ has previously been reported as related to the secretory pathway in pancreatic beta cells.
Given the importance of T2D burden in the EU, the methods and results generated in this study contribute to the general understanding of T2D development, and provide new methods for analysis of large genomic data throughout the integration with the human interactome. Furthermore, the results obtained in this study highlight a series of proteins as plausible participants in the development of T2D increasing the genetic knowledge about T2D aetiology.