Periodic Reporting for period 1 - NovoFold (De novo protein discovery as a tool for understanding the folding conundrum)
Reporting period: 2019-01-03 to 2021-01-02
The NovoFold project aims at exploring protein sequence space beyond the natural proteome. The natural proteome is a result of natural evolution and therefore limited in diversity. As thus it is not comprehensive of all possible folded amino acid sequences and protein folds. In fact, protein designers have successfully created new folds that are unprecedented in nature. However, designers and their developed algorithms are biased towards recreating variations of existing structures. In contrast, the NovoFold technology enables broad random searches in sequence space, potentially discovering new folds that lie beyond designers’ imagination.
Rather than searching for particular bioactivities (e.g. binding, catalysis etc.) the NovoFold assay will experimentally identify sequences, whose primary property is folding. With a throughput of billions to trillions of sequences, the NovoFold platform has the potential to provide further data sets for data driven approaches to solve the protein folding problem. The NovoFold technology interfaces mRNA display with a protein folding sensor, based on the ribosome in conjunction with the arrest peptide SecM. The SecM arrest peptide sequence has been previously used to study protein folding of various proteins, a field pioneered by von Heijne and coworkers. To date, using arrest peptides it has been possible to identify folding intermediates and further analyse co-translational protein folding. Hereby, the ribosome arrests at the last position of the GIRAGP arrest sequence motif, through perturbation of the peptidyl transfer centre. However, if a force is applied to the nascent peptide chain, the ribosome can resume protein synthesis. This force can be applied mechanically or through a protein sequence upstream of the arrest motif which folds within the exit tunnel of the ribosome and exerts a pulling force. Typically, such arrest peptide based folding assays are performed one at a time in in vitro translation extracts. Combining the arrest peptide technology with mRNA display and sequencing would allow to investigate many proteins (and their mutants) in parallel. In mRNA display, a RNA template, which is 3’ covalently linked to puromycin is ribosomally translated in vitro. Once a stop codon close to the end of the template is reached, puromycin is inserted, leading to a covalent linkage between peptide and RNA. The NovoFold technology takes advantage of this, by only linking folded proteins to their cognate mRNA. This is simply achieved by placing a stop codon further downstream of the arrest peptide. Expressing an N-terminal affinity tag, proteins can be panned, leading only to the recovery of cDNA of folded proteins. Using mRNA display it is possible to investigate up to a trillion different sequences. This assay, is very direct, obliterating the use of proteases, which typically have a biased substrate scope and have been previously used in similar assay formats. Apart from identifying de novo proteins and providing large data sets for studying protein folding, this assay could also be applied to the exploration of proteomes beyond the 20 canonical amino acids, towards engineering of xenobiological systems. The objectives of the MSCA action were to implement the technology and show its applicability to random/naïve protein libraries.
Subsequently, in vitro translation was coupled with mRNA display by using puromycin linked mRNA as the template. Quantification of cDNA by real time PCR revealed that only low display efficiencies could be achieved. However the recovery of the folded protein was ca. 6-fold higher compared to its non-folding counterpart. In fact, comparison with the respective non arresting and abortive (stop codon in arrest peptide) controls showed identical recoveries. This suggested that the screening assay is functional and could be applied for selection experiments.