Skip to main content

An integrated study on three novel regulatory hubs in megakaryocytes and platelets, discovered as risk genes for myocardial infarction by a genome-wide association and platelet systems biology study

Final Report Summary - NETSIM (An integrated study on three novel regulatory hubs in megakaryocytes and platelets, discovered as risk genes for myocardial infarction by a genome-wide association and platelet systems biology study)

Executive Summary:


The purpose of NETSIM was to provide first rate training to young scientists by bringing together the analysis of genetic markers, biomarkers and lifestyle risk factors together in one network with a focus on Coronary Artery Disease (CAD) and the process of thrombus formation by platelets which underlies all types of cardiovascular diseases. Laboratory-based studies have been accompanied by quantitative studies in population science, physical activity capture, phenotype capture in model organisms, bioinformatics, statistics, and mathematics.

The training activities of the network were in two clusters:

1. Integrated Systems Biology and Genomics (ISBG) – Aim to define the mechanism by which common sequence variation in the DNA between individuals modifies the phenotype.
2. Genetic Epidemiology, Biomarkers and Biosignals (GEBB) – To better quantify CAD risk factors such as family history, genotype, biomarkers, life style and physical activity.

1. ISBG Cluster

Genome-wide association (GWA) studies in tens of thousands of individuals have been very successful in identifying genes associated with complex traits, including diseases. In the ISBG NETSIM cluster a number of approaches have been taken to further identify and understand the role of genetic variation in the cause of heart attacks. These studies have focused on platelets, the smallest cell in the blood and the cell that makes the blood clot. A clot in the arteries that supply the heart with blood is a key event that leads to a heart attack. The number of platelets in the blood and their volume and function are risk factors for heart attacks. The UCAM Haematology group completed a GWA study in nearly 70,000 individuals and discovered 68 DNA hallmarks that highlight regions in the genome important to platelet formation. The group then used zebrafish as a model system to identify how these new genes regulate the formation of thrombocytes. One of the novel genes was studied in greater depth. The chosen gene, ARHGEF3, not only controls the formation of platelets but also regulates the formation of red cells. Detailed studies in human cells showed that the protein encoded by the ARHGEF3 gene acts as a novel regulator for the uptake of iron by blood stem cells which produce red blood cells. The exciting results obtained by the NETSIM PhD fellow has laid the foundation for a far more ambitious project to silence all the genes identified in the GWA studies and to perform an in-depth analysis of the function of these novel regulators of platelet formation. Zebrafish are a powerful and affordable model organism to obtain a comprehensive understanding of the function of genes with hithertho unknown function.

Genes can be switched off in zebrafish by relative simple means making a high throughput gene scan reality. The capture of the video images of the process of clot formation in zebrafish after deliberate damage of the blood vessel wall is labour intensive and repetitive. TUM has introduced novel methods for the analysis of video sequences displaying low contrast and high dynamic conditions in the blood vessel of zebrafish, and in particular for the automatic detection of clot formation in low contrast in vivo microscopic image sequences. The computer algorithm developed by TUM can now be used to reliably capture clot formation in zebrafish allowing information for a larger number of genes to be obtained. Key to this aim will be to perform segmentation of the clot by distinguishing different motion patterns in image time-series rather than by performing a standard image segmentation task in each frame. TUM has modelled motion patterns by energies and regularised segmentation by two prior energies on the topological relationship between the clot and the vessel wall and on the shape of the aortic region.

More than ¾ of the 68 DNA variants that regulate the number and volume of platelets in man are localised in regions of the genome that do not code for protein. At the outset of the NETSIM project it was assumed that these so-called non-coding variants influence the number and volume of platelets through regulation of gene expression. To investigate this assumption WTSI used the FAIRE (formaldehyde-assisted isolation of regulatory elements) assay coupled with massive parallel sequencing to construct high resolution maps of the genome in human megakaryocytes (the parent cell of platelets). These maps show which parts of the genome are “open for business” and which parts are “closed for business”. Maps were not only constructed for the parent cell of platelets but also for the parent cell of red blood cells (the erythroblast) and of macrophages (the monocyte). It was found that (i) variants that regulated the number and volume of platelets are often localized in regions which are only “open for business” in megakaryocytes and “closed for business” in erythroblasts and monocytes. The GWA study variants are often localized in cell type-specific regulatory hubs of gene transcription and (ii) there is a statistically significant enrichment of the DNA variants identified in the GWA study in DNA regions that are “open for business”. Through investigation of the molecular mechanism of the chromosome 7q22.3 platelet volume/number and function locus the GWA study identified variant was localized in a megakaryocyte-specific regulatory hub. At the position of the variants the binding of the transcription factor EVI1 was observed but the strength of binding differed substantially between the two nucleotides which can occur at this position. Increased binding of the EVI1 protein reduces the transcription of the PIK3CG gene in megakaryocytes. This gene encodes a kinase which is essential for the formation of inositol-triphosphate (IP3). Lower levels of the kinase transcript are assumed to be linked with reduced IP3 levels and therefore release of calcium from the intracellular stores will be reduced. In mice lacking the Pik3cg gene it was observed that the kinase PIK3CG is associated with pathways with established and critical roles in platelet function. The same functional FAIRE maps of the three blood cell types was used to aid the functional characterization of the RBM8A gene, which was found to be causative of thrombocytopenia with absent radii (TAR), a rare congenital malformation syndrome.

Other studies in patients with rare platelet syndromes identified the transcription factor MEIS1 to be possibly implicated in the formation of platelets. To gain further insight into the role of MEIS1 its orthologue meis1 was silenced in zebrafish and a profound effect on the formation of thrombocytes was observed. This result prompted studies at UCD using an -omics approach (specifically proteomics, transcriptomics and chromatin immunoprecipitation and massive parallel sequencing-CHIP-Seq) to identify proteins in the nucleus of megakaryocyte-like cells that interact with MEIS1 and to determine at which positions in the genome MEIS1 binds. Intersection of the three –omics data sets identified 30 potential candidate genes regulated by MEIS1 and found to be involved in endomitotic cell cycle in the platelet parent cell, the megakaryocyte. Bioinformatics analysis on these genes revealed “cell cycle” as the only significantly (P<0.05) enriched biological process when compared with the human genome. Numerous studies have shown that cell cycle regulating genes are actively influencing unique endomitotic process, megakaryopoiesis and proplatelets formation. Cell cycle regulators such as ARHGEF3, CDC5L, ROCK2, CUL3, MACF1, MSH3, AATF and RBX1 were discovered in our analysis and most of them showed more than 2 MEIS1 binding sites. In-depth literature analysis on the rest of the candidate genes showed their involvement in megakaryoblastic leukaemia (NIPBL and EPS15) and vascular development (VEZF1, SMAD2 and SMAD3). Quantitative PCR array analysis in non-synchronised cells (BirA and BIOMEIS1) showed several cell cycle regulatory genes. Further, DNA content analysis on synchronised cells (BirA and BIOMEIS1) demonstrated that DNA synthesis was longer in BIOMEIS1 cells compared to BirA cells. In addition, EDU (Thymidine analogue) pulse–chase studies indicated that entry into S phase was accelerated by up to 12h in MEIS1-overexpressed cells.

At AMC we focussed on a number of families with extremely early onset severe premature atherosclerosis (PAS) in the absence of familial hyperlipedemia. We postulated that the severe PAS was caused by rare DNA variants in the coding fraction of genes that code for proteins which are important for maintaining vascular health. The coding fraction of the genome or so called exome of four pedigrees with a dominant form of PAS was sequenced to elucidate the genetic cause of PAS in these families. So far two PAS-candidate genes may have been identified being KERA and MCF2L. Functional studies have been performed to better understand the functional role of the proteins encoded by these two genes in vessel wall health.

MRC focused on using statistical methods to understand how the activity of human genes is controlled. The amount of data collected from the genomes of healthy and disease individuals is often substantial and therefore computational tools to organize and manage the data so that it is interpretable to health researchers are crucial. The analysis of genomics data from NETSIM and publically accessible databases resulted in three key findings: (i) 14 genes that can be used to differentiate between myeloid and lymphoid blood cell types, (ii) 11 proteins that work together to control the development of blood cells, and (iii) association between variations in the DNA sequence and gene activity. Advancements in the computational tools were also made in particular (i) development of statistical tests to identify proteins that work together to control gene activity, and (ii) a method to predict genes affected by genetic mutations.

2. GEBB cluster

The key aim of UCAM-IPH was to investigate the association between various forms of tobacco use with the risk of heart attack and other cardiovascular diseases in the setting of developed countries and of a developing country with a large population, Pakistan. Firstly, for the investigation of cigarette, pipe and cigar smoking forms of tobacco, the emerging Risk Factor Collaboration database was analysed, which included by April 2011 a total of 135 prospective cohort studies set up in developed countries and 929,335 individuals with baseline information on smoking status. Secondly, for the investigation of the association between chewing and dipping forms of tobacco and the cardiovascular risks, the Pakistani Risk of Myocardial Infarction Study was used, which had recruited by May 2011 a total of 7,905 first ever myocardial infarction (MI) cases and 7,458 controls who were frequency matched by 5 years age bands and by sex to cases. All forms of tobacco use were significantly associated with the risk of cardiovascular diseases. Cigarette smoking was the deadliest form of tobacco, producing between a doubling and a tripling in the risk of heart attacks in both developed and developing countries. By contrast, the use of other smoking forms such as cigar or pipe were associated with increases of 35% and 84% additional risk compared to a never smoker. Regarding South Asian smokeless forms of tobacco, both chewing and dipping were associated with strong increases in risk of heart attacks, of between 40% and 70% additional risk. There was no safe level of tobacco use in relation to cardiovascular risk. By contrast, quitting smoking or stopping the use of smokeless products was associated with a significantly reduced risk of disease. The risk of past smokers became not significantly different from that of never smokers 20 years after stopping to smoke. In conclusion these results emphasise the need to adopt consistent policies restricting the use of all types of tobacco use, to prevent the very high burden of cardiovascular disease in developed and developing countries.

Physical activity is assumed to play a major role in the outcome and treatment of many chronic diseases such as heart attacks, stroke, osteoporosis and multiple sclerosis, to name but a few. Well powered studies are required to estimate the effect size of regular physical activity on different medical conditions. TRIUM and SLC have focused on the refinement and validation of the use of mobile accelerometry – actibelt technology – as an objective tool to a) define and use new outcome measures for clinical trials and observational studies such as real life walking speed and b) to measure the dose of treatment with physical activity in exercise trials. “Acceleromics” as a counterpart to “Genomics” is introduced in the context of the lecture program “Clinical Applications of Computational Medicine” at the TU and LMU Munich and “The Human Motion Institute” could be established. The research of Robert Bosch Healthcare focused on the detection and classification of motion and physical activity of elderly and frail people in the context of chronic illness and telehealth. The group studied physical activity in patients with chronic obstructive pulmonary disease or chronic heart failure, developed objective measurements for Parkinson’s disease, as well as using physical activity as a quality marker in the rehabilitation unit in hospitals. The results show a clear indication for the use of motion monitoring in chronic ill patients and will further be evaluated in the context of the Bosch TeleHealth system. Additional results of the rehabilitation quality measurement with a motion monitor show a high patient specific correlation in recovery over a two weeks time period.