A great surprise in the wake of the Human Genome Project has been the discovery of vast numbers of RNAs that do not encode proteins. Alongside 19,000 protein-coding genes, our genome contains at least 20,000 long noncoding RNA (lncRNA) genes, but recent estimates push that number to 100,000. Extensive annotation efforts are presently discovering lncRNAs far faster than their functions can be elucidated, and thus only <1% of lncRNAs have been experimentally characterised. These discoveries have created a grand challenge of understanding lncRNAs’ biological significance. To do this, we must solve the pressing question of how lncRNAs’ functions are encoded in their primary sequence. As a proxy of lncRNAs function we can use subcellular localisation since lncRNAs function as a mature RNA molecule. One hypothesis is that, similar to proteins, lncRNAs are modular molecules composed of separable functional domains. Previous studies, including my own, have used conventional methods to identify domains in a handful of lncRNAs by laborious deletion experiments. I propose to advance this field, via a novel high-throughput technique, CRISPR-Locate, capable of detecting lncRNA domains and their function in their natural endogenous context. I will delete ⁓1000 different lncRNA domains at the same time and simultaneously insert RNA tags at their place. Subsequently, I will fractionate cells by their compartments and purify tagged lncRNAs. Afterwards, I will sequence all the tagged lncRNAs and, by comparing the change in the subcellular localisation between mutated and wild type cells, I will identify which domains are responsible for subcellular localization. Finally, I will use CRISPR-Locate to create a map of lncRNA domains, an invaluable resource linking sequence to function, and bring us a step closer to unlocking the potential of 10^4 novel genes in medicine and biology.
Call for proposal
See other projects for this call