Periodic Reporting for period 2 - CRISPRcombo (Interrogating native CRISPR arrays to achieve scalable combinatorial screens and dissect genetic redundancy)
Reporting period: 2021-12-01 to 2023-05-31
The goal of this ERC Consolidator project is to elucidate and apply the properties of CRISPR arrays toward untangling redundancy in biology. The proposed work was broken into three distinct objectives centered around elucidating design rules for CRISPR arrays, understanding why CRISPR arrays do not undergo rearrangements despite their incredibly repetitiveness, and applying insights to interrogate one example of redundancy within gene regulation by bacterial small RNAs. In turn, this project is expected to deepen our understanding of an important yet understudied aspect of CRISPR biology and lay a foundation to interrogate redundancy in biology. Given the breadth of examples of biological redundancy, those efforts in turn could have wide-ranging impacts, from cancer treatment to antibiotic development and devising new means to treat genetic disease.
Within the natural features of CRISPR arrays, we discovered that the region upstream of CRISPR arrays contributes to the production of the encoded guide RNAs. This region, called the leader, was associated with other aspects of CRISPR biology but never guide RNA production. We showed for some CRISPR-Cas systems that this region interacted with the front end of the CRISPR array, promoting subsequent processing steps. As a result, the guide RNA targeting the invader most recently encountered by the cells is prioritized for defense, ensuring that the systems are primed against invaders that might reappear or could still be lurking in the environment.
Exploring other aspects of CRISPR biology beyond the CRISPR arrays has also proven fruitful. For instance, we discovered a set of novel CRISPR nucleases unlike any other known nucleases. These nucleases, which we have dubbed Cas12a2, look for RNA targets and, upon finding their target, begin degrading virtually any nucleic acid they encounter. This activity extends to double-stranded DNA, the information storage material of cells and many invaders alike. This process shuts down the infected cell, preventing the invader from spreading to other cells in the population.
In a separate example from CRISPR biology, we exploited the discovery that the tracrRNA, a processing factor necessary to go from Cas9 CRISPR arrays to guide RNAs, could convert cellular RNAs into guide RNAs for use by Cas9. After engineering this process, we were able to achieve a technological first: recording selected cellular transcripts in single cells. This technology allows us to peer into a cell’s past while tying it to its present state.
Laying the foundation for CRISPR array design, we have been developing a tool to predict targeting activity based on the guide RNA sequence. While many such tools exist, few have focused on using CRISPR to silence genes in bacteria. We applied machine learning with published datasets to devise an algorithm for predicting “good” guide sequences and “bad” guide sequences. We also explored how to make CRISPR arrays used by Cas9 more compact, finding that arrays can be shortened. In some cases, shortening the array even improved performance.
Finally, we have been advancing a simple system for characterizing CRISPR biology and technologies: cell-free transcription-translation (TXTL). TXTL can be created with specially prepared innards of bacterial cells, allowing us to go from DNA to RNA to protein without working with live cells or going through time-consuming protein and RNA purifications. Using. TXTL, we were able to establish new approaches for characterizing CRISPR-Cas systems involving multiple components. We also found that re-optimizing TXTL preparation allowed us to begin using linear DNA. This step makes it easier to go from designed DNA sequence to experimental testing, accelerating our ability to perform experiments.
The work on the leader upstream of CRISPR arrays opened new opportunities to explore how this region impacts other CRISPR-Cas systems. We expect this phenomenon to extend to more swaths of CRISPR biology while revealing new variations on the theme. We also expect to incorporate the leader as part of CRISPR array design that has not been considered.
The work on Cas12a2 nucleases will next explore the natural diversity of these nucleases. Their diversity appears to be far greater than we initially reported, and we expect to reveal new biochemical properties that could expand the application space of these nucleases. These nucleases also co-occur with Cas12a nucleases next to individual CRISPR arrays, where we expect to uncover how these nucleases utilize individual arrays to combat targeted invaders. We will also begin exploring in vitro and cellular applications of these nucleases, such as their use for molecular diagnostics.
Building on our ability to predict “good” guides and create compact CRISPR arrays, we will continue working toward the predictive design of CRISPR arrays. Our expectation is that we will create design tools that account for not only the guide sequence but also where it appears in a CRISPR array. We will also apply these arrays to disentangle redundancy in small RNA networks in bacteria, with the expectation of identifying core sets of small RNAs that contribute to different cellular processes. This example will become a starting point for others to interrogate biological redundancy and extract fundamental principles.
We also will delve into the stability of CRISPR arrays. Our expectation is that we will identify cellular factors responsible for their stability, where such factors can be expressed in other organisms to boost the overall performance of CRISPR arrays.
Finally, we will continue applying TXTL to interrogate CRISPR arrays and other aspects of CRISPR biology. Its use will aid many of the experimental efforts described above. Our expectation is that TXTL will become more commonly used by the research community to accelerate the pace of scientific discovery.