Periodic Reporting for period 3 - CRISPRcombo (Interrogating native CRISPR arrays to achieve scalable combinatorial screens and dissect genetic redundancy)
Reporting period: 2023-06-01 to 2024-11-30
The goal of this ERC Consolidator project is to elucidate and apply the properties of CRISPR arrays toward untangling redundancy in biology. The proposed work was broken into three distinct objectives centered around elucidating design rules for CRISPR arrays, understanding why CRISPR arrays do not undergo rearrangements despite their incredibly repetitiveness, and applying insights to interrogate one example of redundancy within gene regulation by bacterial small RNAs. In turn, this project is expected to deepen our understanding of an important yet understudied aspect of CRISPR biology and lay a foundation to interrogate redundancy in biology. Given the breadth of examples of biological redundancy, those efforts in turn could have wide-ranging impacts, from cancer treatment to antibiotic development and devising new means to treat genetic disease.
Within the natural features of CRISPR arrays, we discovered that the region upstream of CRISPR arrays contributes to the production of the encoded guide RNAs. This region, called the leader, was associated with other aspects of CRISPR biology but never guide RNA production. We showed for some CRISPR-Cas systems that this region interacted with the front end of the CRISPR array, promoting subsequent processing steps. As a result, the guide RNA targeting the invader most recently encountered by the cells is prioritized for defense, ensuring that the systems are primed against invaders that might reappear or could still be lurking in the environment.
Exploring other aspects of CRISPR biology beyond the CRISPR arrays has also proven fruitful. For instance, we discovered a set of novel CRISPR nucleases dubbed Cas12a2 that look for RNA targets and, upon finding their target, begin degrading virtually any nucleic acid they encounter. This activity extends to double-stranded DNA, the information storage material of cells and many invaders alike. We also discovered two clades of nucleases most closely related to Cas12a2, with one (Cas12a3) exhibiting RNA-triggered cleavage of tRNA tails. These nucleases represent the first examples in the CRISPR family in which the target-dependent enzymatic activity of the nuclease is directed away from the target to enact the immune response.
In a separate example from CRISPR biology, we exploited the discovery that the tracrRNA, a processing factor necessary to go from Cas9 CRISPR arrays to guide RNAs, could convert cellular RNAs into guide RNAs for use by Cas9. After engineering this process, we were able to achieve a technological first: recording selected cellular transcripts in single cells. This technology allows us to peer into a cell’s past while tying it to its present state. We also applied the concept to tracrRNA-dependent Cas12 nucleases that, upon target DNA recognition, collaterally cleave single-stranded DNA. This approach allowed us to harness these DNA-targeting nucleases for direct RNA detection, relying on collateral cleavage for signal amplification.
Laying the foundation for CRISPR array design, we developed a tool to predict targeting activity based on the guide RNA sequence. While many such tools exist, few have focused on using CRISPR to silence genes in bacteria. We applied machine learning with published datasets to devise an algorithm for predicting “good” guide sequences and “bad” guide sequences. We also explored how to make CRISPR arrays used by Cas9 more compact, finding that arrays can be shortened. In some cases, shortening the array even improved performance. We also used the arrays in other contexts, such as the first sRNA screens in bacteria using the gut microbe Bacteroides thetaiotaomicron as a model.
Finally, we advanced a simple system for characterizing CRISPR biology and technologies: cell-free transcription-translation (TXTL). Using. TXTL, we established new approaches for characterizing CRISPR-Cas systems involving multiple components. We also found that re-optimizing TXTL preparation allowed us to begin using linear DNA. This step makes it easier to go from designed DNA sequence to experimental testing, accelerating our ability to perform experiments.
The work on the leader upstream of CRISPR arrays opened new opportunities to explore how this region impacts other CRISPR-Cas systems. We expect this phenomenon to extend to more swaths of CRISPR biology while revealing new variations on the theme.
The work on Cas12a2 nucleases will next explore the natural diversity of these nucleases. Their diversity appears to be far greater than we initially reported, and we expect to reveal new biochemical properties that could expand the application space of these nucleases. These nucleases also co-occur with Cas12a nucleases next to individual CRISPR arrays, where we expect to uncover how these nucleases utilize individual arrays to combat targeted invaders. We will also begin exploring in vitro and cellular applications of these nucleases, such as their use for molecular diagnostics or programmable cell killing, where the latter became the basis of a funded ERC proof-of-concept grant.
We also will delve into the stability of CRISPR arrays. Our expectation is that we will identify cellular factors responsible for their stability, where such factors can be expressed in other organisms to boost the overall performance of CRISPR arrays.
Finally, we will continue applying TXTL to interrogate CRISPR arrays and other aspects of CRISPR biology. Its use will aid many of the experimental efforts described above. Our expectation is that TXTL will become more commonly used by the research community to accelerate the pace of scientific discovery.