Skip to main content

Identification and screen of RNA Editing in the Human Genome

Final Report Summary - GENEDISCREEN (Identification and screen of RNA Editing in the Human Genome)

Fidelity of the genomic information content is a fundamental perception in biology. In general, it is believed that genomic information is identical in all the cells of the organism and that the processed content of a gene (RNA) has same exact sequence as in its original DNA template. Nonconforming events that alter the original genomic content can be deleterious and are called mutations. However, two endogenous processes that can modify genomic content have been identified in humans and in many other organisms:
(i) Active retrotransposons replicate through a single-stranded RNA intermediate, which is reverse transcribed and integrated into the host genome.
(ii) Editing, of either RNA or DNA, involves alteration of particular nucleotides into different ones.
Both these endogenous processes have the potential to radically alter our view of genomic integrity and complexity by creating huge genomic diversity within an organism, and even within a single cell, where editing can generate a large number of different transcripts. However, the magnitude and scope of the editing and active retrotransposition events are poorly understood, mainly due to the difficulty of studying sequences that have large numbers of similar copies in a genome. We were the first to develop successful genomic scale approaches to detect RNA editing events by ADARs enzymes and DNA editing cases by APOBECs , and discovered a tight linkage between both types of editing and retroelements. We aim to leverage this linkage for the study of these paradigm shifting mechanisms and their involvement in wide spectrum of biological domains.

For the past five years (four of them are under this grant) I have been a member of the Mina and Everard Goodman Faculty of Life Sciences at Bar-Ilan University (BIU). In this period I established a research group in a lab that it is both experimental and computational and was able to conclude and publish several projects. My group main achievements were in the field of editing: We developed, and applied, several global approaches to find the full human editosome and the first genomic screen for DNA editing by apobec3 in genomic retroelements:

• Alu Editing

In this line of works we provide the first report of the full scope of editing in the human transcriptome. Specifically, the main findings reported in our recent studies are as follows: 1) We show that virtually all adenosines in the hundreds of thousands Alu’s that can form double strand RNA (the preferred targets of the ADARs) are subject to A-to-I editing, resulting in a total of more than 100 million genomic sites, residing in most genes. 2) The ultra-high coverage we used allowed for detection of even very low levels of editing. 3) Editing in lincRNAs and antisense transcripts was detected for the first time. With these results, A-to-I editing becomes the most comprehensively characterized post-transcriptional modification in the human transcriptome. The massive number of Alu editing events dwarfs the few known editing events within the coding regions. For decades, the editing community focused on Ion channels. We now estimate that these sites account for less than 0.001% of the editing activity. Thus, it is tempting to speculate that some advantageous effects have arisen due to Alu editing in the course of primate evolution.

• Hyper editing

Traditionally, RNA editing is detected by comparing RNA sequences to their source DNA and searching for high-confidence A-to-G mismatches. Recently, a number of groups have scanned large RNA-seq datasets for evidence of RNA editing, discovering about a million of human editing sites and thousands of sites in other species.
However, in this two works, we show that previous studies have in fact missed the bulk of editing sites by overlooking the most heavily edited regions of the genomes. As heavily edited RNA molecules differ widely from their DNA, reads covering those molecules have usually been discarded. We devised a new pipeline that enables the mapping of even the most edited reads and applied it to a number of RNA-seq datasets in human and other species. Remarkably and surprisingly, our screen has detected an enormous number of previously unobserved sites. For example, in an RNA-seq dataset generated by the ENCODE project, 843 sites were reported in the original publication. We discovered, on the same dataset and with extremely high specificity, more than 125,000 sites, an almost 150-fold enrichment. We overall increased the known number of human editing sites by about 500,000 sites (≈50% increase), detected tens of thousands of new sites in mouse and fly, and discovered massive editing for the first time in rat and opossum. We analyzed the properties of the human ultra-edited regions we detected (such as sequence context, tissue-of-origin, localization in repetitive elements, etc.) and experimentally validated a number of candidates. We show that careful alignment and examination of the unmapped reads in RNA-seq studies reveal numerous new sites, usually many more than originally discovered, and in precisely those regions that are most heavily edited. Our results establish that hyper-editing events account for the majority of editing sites.

• Evolutionary conserved A-to-I RNA editing sites:

In addition to the identification of the numerous Alu editing sites, we found that the number of mammalian conserved editing sites is surprisingly small and has distinguished characters than the non-conserved sites. This few sites are considered to have an important function and are probably responsible for the ADAR’s knock-out severe phenotype. We found that this set has a unique genomic distribution, tend to be located in brain related proteins, and has higher editing and expression levels. In addition, we were able to define the required RNA secondary structures for many of the RNA editing sites. We also found high constancy of editing levels of this set within mice strains and between human and mouse, suggesting that regulation of editing levels implies their eminent function. Our results highlight the dynamics of selection and formation of ADARs’ targets across mammalian evolution.
We show that despite the discovery of numerous editing targets, only a few of them found to be conserved within mammalian evolution. Those sites exhibit unique features and are probably playing a pivotal role in mammalian biology. We screen these sites now at various diseases, looking for aberrations at the editing levels that may be involved with these diseases.

• DNA editing by APOBEC3

Retrotransposons have thrived throughout evolution to occupy significant portions of vertebrate genomes. It has been a paradox how these selfish elements have so successfully proliferated in vertebrates despite their mutagenic potential. One emerging explanation is that the host's ability to exapt retrotransposons for novel or modified function makes their sustainment worthwhile. However, the mechanism by which the retrotransposons are tamed has been elusive.
DNA editing by APOBECs is an antiviral mechanism that restricts retroviruses and retrotransposons by G-to-A hyper-editing. Our recent finding that hypermutated retrotransposons can be inserted into the genome led us to hypothesize that APOBECs have a role in attenuating retrotransposon mutagenicity and enable the genome to utilize these transformed sequences for its benefit. To test this hypothesis, we devised a multi-genomic screen of 80 diverse species and found a plenitude of APOBEC-mediated hypermutated retrotransposons. Most importantly, we establish that this hypermutation accelerates evolution, as the edited elements are preferentially exapted by the genome. This is evidenced from their enrichment in functional regions, such as genes, exons, promoters and transcription start sites.
Our results demonstrate that DNA editing has created a large pool of ``raw’’ genetic material that led to the accelerated evolution of both retrotransposons and their hosts by parallel testing of multiple evolutionary tracks. This suggests a new mechanism of evolution in which genomic diversity is gained by bursts of correlated events. We demonstrate for the first time that an antiviral mechanism has accelerated mammalian evolution by insertion of numerous simultaneous mutations into their genomes.

• Editing in cancer
Cancer is driven by alterations of the genomic information, mainly mutations in key genes that provide the cancerous cell a selective advantage for clonal multiplication. However, mutations in the DNA are not the only source for modifying the genomic content. RNA editing, a site-specific modification, alters the mRNA sequence from its genomic blueprint. This alteration results in dynamic epi-mutations, changes in the mRNA transcripts, which could ultimately lead to outcomes similar to those of genomic mutations. Unlike a genomic mutation, RNA editing affects varying fractions of the copies of the targeted transcript, leading to much higher flexibility. To date, transcriptome-wide characterization of this modification across multiple cancer tissues has not been reported. We show, using global measurements of RNA editing levels in hundreds of cancer samples, that A-to-I editing and the enzymes mediating this modification are, significantly altered in most cancer types screened, resulting in a sizable global effect on the transcriptome. In most tumor types, editing levels are elevated compared to their matched normal tissues, with the strongest signal detected in breast, head and neck, thyroid and lung cancers. Overall, the number of RNA nucleotides modified by editing events in cancerous tissues outnumbers the genomic DNA mutation load. While the vast majority of editing events take place within Alu repeats, we have found several non-Alu sites in coding sequences that were altered significantly in cancer. Our results suggest that editing may supplement genomic DNA alterations as a means to drive tumor genesis. As is the case with cancer associated somatic mutations, most RNA editing events are likely 'passengers' and only few may serve as 'drivers' in each patient. We hypothesize that classification of both DNA and RNA modifications is essential to determining patients' profile and treatment. Identifying those driver RNA editing sites may provide novel candidates for therapeutic and diagnostic purposes. (submitted)

We utilized a cutting-edge approach composed of microfluidics-based multiplex PCR (mmPCR) for the simultaneous amplification of 48 target-regions which contains the pre-selected RNA editing target sites across a 48 samples panel. These PCR products were than index-tagged and were subjected to deep sequencing for precise single molecule resolution recording of editing levels. We used this approach in many of our works.

We believe that our project lead to a deeper understanding of RNA and DNA editing mechanisms. Our finding have an impact on a broad range of topics: not only paves the way into a modified model for vertebrate’s genome evolution but to a new type of biomarkers and therapeutic targets in cancer.