Skip to main content

Discovery and characterization of functional disordered regions and the genes involved in their regulation through next generation sequencing

Periodic Reporting for period 3 - IdrSeq (Discovery and characterization of functional disordered regions and the genes involved in their regulation through next generation sequencing)

Reporting period: 2019-05-01 to 2020-10-31

If DNA is the blueprint of life, proteins are the building blocks. Thus understanding the molecular basis of life requires a deep knowledge of proteins that make up all organisms. A theory of proteins allows one to interpret a genome, and thus holds great promise for human health. The biochemical studies pioneered by Anfinsen in the 1960s and the development of powerful methods (e.g. crystallography) established the sequence-structure-function paradigm. With the availability of completely sequenced genomes, it has become clear that a large fraction of any eukaryotic genome (>40%) encodes protein segments that do not autonomously fold into a defined tertiary structure (although they may contain secondary structural elements) and thus do not directly follow Anfinsen’s postulate. These regions are commonly referred to as intrinsically disordered regions (IDRs). IDRs are enriched in critical functions such as transcription and signaling, and have been linked with numerous diseases including neurodegeneration and cancer. Despite their importance and In contrast to structured regions, the molecular principles behind the sequence-function relationship of IDRs remain poorly understood. Therefore it is critical to understand what makes certain disordered regions functional and why mutations in certain IDRs lead to disease.

The overall objectives of this proposal is to identify and characterize functional IDRs in cells, and to discover genes involved in their regulation using yeast as a cellular model. We proposed to develop and apply a targeted, high-throughput, multiplexed approach that we call IdrSeq (for Intrinsically disordered region Sequencing). This has been achieved now and published in an open access journal (Ravarani et al, MSB 2018). Specifically, using IdrSeq, we aim to discover and characterize IDRs that can
(Aim 1) function in transcriptional activation, and discover genes that modulate transcriptional activity
(Aim 2) influence protein stability, and discover genes involved in regulating half-life and
(Aim 3) form higher-order assemblies and discover genes that regulate assembly formation

The unique feature of this proposal is its integrative vision of synthetic & systems biology, (un)structural biology, cell biology, genetics, experiments and computation to establish a discovery platform to study IDRs in a cellular context. Since IdrSeq is modular and scalable, it can be readily extended to investigate a broad range of IDR functions, and adapted to other organisms. Elucidating the principles of sequence-function-gene relationship of IDRs holds enormous potential for synthetic biology. The discovery of genes that regulate IDR function has direct implications for human health by revealing novel therapeutic targets.
Aim 1: Identify and characterize IDRs that function as transcriptional activators, and discover genes that modulate IDR mediated transcriptional activity.

Since the beginning of the project, we have developed and presented IDR-Screen, a framework to discover functional IDRs in a high-throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality-conferring patterns in their protein sequence have been inferred through statistical learning. Using yeast HSF1 transcription factor-based assay, we have discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we have discovered that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We could use this rule to design new sequences with increased strength of transactivation. We also used this approach to discover the impact of polymorphisms seen in the natural population as well as cancer genomes of the human transcription activation domains. We anticipate that investigating defined sequence libraries using IDR-Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.

The work and the dataset have been published as an Article in the open access journal Molecular Systems Biology (Ravarani et al, MSB 2018). The work was featured as a cover image with a news and views highlighting the importance of our work. The paper was also identified as Exceptional by F1000.
F1000: https://f1000.com/prime/733243496
News and Views: http://msb.embopress.org/content/14/5/e8377
Cover: http://msb.embopress.org/content/14/5.cover-expansion

Aim 2: Identify and characterize IDRs that influence protein stability, and discover genes that regulate IDR mediated protein stability.

In the last couple of years, our team has assayed a viral proteome using the IDR-Screen approach and have discovered regions that act as strong degrons. More importantly, we are now performing followup screens to discover the ubiquitin ligases that regulate the degron activity. The work describing the project will be written up for publication next year.

Aim 3: Identify IDRs that can form higher-order assemblies/aggregates and discover genes/conditions involved in regulating assembly formation.

In the last year, our team has developed the assays for screening peptides that form aggregates and have assayed a viral proteome using the IDR-Screen approach to discover regions that can form higher order assemblies so far. We are exploring different sequences to be used as libraries for discovering regions that form higher order assemblies. However the grant had to be terminated as the PI moved to the USA.
While we have a deep understanding of how structured domains carry out their function, the sequence-function relationship of IDRs remains poorly understood. IDRs are emerging to be important for diverse cellular functions and are involved in a number of human diseases. Yet, we did not have a reliable high-throughput approach that allows the investigation of IDRs in a cellular context.

Thanks to the generous funding through this ERC consolidator grant, we have now developed a targeted, high-throughput, multiplexed technology called Idr-Screen. The unique aspect of this approach is its integration of synthetic & systems biology, (un)structural biology, cell biology, genetics, and experiments and computation to establish a discovery platform to identify and characterise functional disordered regions directly in a cellular context. Given the emerging importance of IDRs and a newfound understanding of their biomedical relevance, the approach we have developed in this project can be and is being readily extended to investigate a broad range of functions of IDRs in a cellular context. For the first reporting period, we have already generated high-resolution data on sequence-function relationship of IDRs that can function as transactivation domains, which is fuelling development of methods for investigation of protein function, and interpretation of genome sequence of disordered regions.

The grant had to be terminated on 30 June 2020 because the PI moved to St Jude Children's Research Hospital in the USA as an Endowed Chair in Biological Data Science and as Director of Center for Data Driven Discovery.
idr-screen.gif