Skip to main content

De novo protein discovery as a tool for understanding the folding conundrum

Periodic Reporting for period 1 - NovoFold (De novo protein discovery as a tool for understanding the folding conundrum)

Reporting period: 2019-01-03 to 2021-01-02

The protein folding problem describes the question of how an amino acid sequence of a protein relates to its 3D structure. This conundrum is among the great challenges in biochemistry and has drawn the attention of scientists for decades. Solving the folding problem would not only revolutionize structure prediction but also protein design, which has vast scientific, technological, and medical implications. Most recently, AI and deep learning have revolutionised the prediction of protein structures from amino acid sequences, which has been pioneered by Alpha Fold from Google. The success of this strategy is at least in part due to the availability of large amounts of structural data and sequence data, which is essential for deep learning approaches. This highlights the increasing importance of data driven approaches in biology and the availability of large high quality data sets.
The NovoFold project aims at exploring protein sequence space beyond the natural proteome. The natural proteome is a result of natural evolution and therefore limited in diversity. As thus it is not comprehensive of all possible folded amino acid sequences and protein folds. In fact, protein designers have successfully created new folds that are unprecedented in nature. However, designers and their developed algorithms are biased towards recreating variations of existing structures. In contrast, the NovoFold technology enables broad random searches in sequence space, potentially discovering new folds that lie beyond designers’ imagination.
Rather than searching for particular bioactivities (e.g. binding, catalysis etc.) the NovoFold assay will experimentally identify sequences, whose primary property is folding. With a throughput of billions to trillions of sequences, the NovoFold platform has the potential to provide further data sets for data driven approaches to solve the protein folding problem. The NovoFold technology interfaces mRNA display with a protein folding sensor, based on the ribosome in conjunction with the arrest peptide SecM. The SecM arrest peptide sequence has been previously used to study protein folding of various proteins, a field pioneered by von Heijne and coworkers. To date, using arrest peptides it has been possible to identify folding intermediates and further analyse co-translational protein folding. Hereby, the ribosome arrests at the last position of the GIRAGP arrest sequence motif, through perturbation of the peptidyl transfer centre. However, if a force is applied to the nascent peptide chain, the ribosome can resume protein synthesis. This force can be applied mechanically or through a protein sequence upstream of the arrest motif which folds within the exit tunnel of the ribosome and exerts a pulling force. Typically, such arrest peptide based folding assays are performed one at a time in in vitro translation extracts. Combining the arrest peptide technology with mRNA display and sequencing would allow to investigate many proteins (and their mutants) in parallel. In mRNA display, a RNA template, which is 3’ covalently linked to puromycin is ribosomally translated in vitro. Once a stop codon close to the end of the template is reached, puromycin is inserted, leading to a covalent linkage between peptide and RNA. The NovoFold technology takes advantage of this, by only linking folded proteins to their cognate mRNA. This is simply achieved by placing a stop codon further downstream of the arrest peptide. Expressing an N-terminal affinity tag, proteins can be panned, leading only to the recovery of cDNA of folded proteins. Using mRNA display it is possible to investigate up to a trillion different sequences. This assay, is very direct, obliterating the use of proteases, which typically have a biased substrate scope and have been previously used in similar assay formats. Apart from identifying de novo proteins and providing large data sets for studying protein folding, this assay could also be applied to the exploration of proteomes beyond the 20 canonical amino acids, towards engineering of xenobiological systems. The objectives of the MSCA action were to implement the technology and show its applicability to random/naïve protein libraries.
The ambitious NovoFold project encompassed various stages. Initially, various protocols for protein expression and purification, in vitro transcription, RNA purification, in vitro translation, protein labelling and panning, RNA ligation, reverse transcription, real time PCR and mRNA display were established in the host laboratory. To this end, fluorescence-based assays for in vitro translation were developed to replace radioactivity based detection methods, including GFP and Spy-Catcher based assays. Protein-arrest peptide conjugates were designed, cloned and translated in reconstituted in vitro translation systems. The location and type of detection tag were varied to achieve optimal performance. Translational arrest of various constructs, comprising folding and non-folding Zn-Finger ADR1a proteins, as well as folding and non-folding R16 spectrin proteins in combination with SecM and modified SecM peptides (non-arresting and aborting with a stop codon in place of Pro) was investigates. Initially, only low yields of arrested protein were obtained for the non-folding variants. In vitro translation conditions were optimized by testing various translation conditions including variation of incubation temperature and time, salt concentrations, including magnesium and potassium cations, as well as other protein factors including PTH. Under optimized conditions a significant improvement in ribosomal arrest could be achieved. However, the arrest showed a strong time dependence, which suggested that selection could only be performed within minutes after translation initiation. To increase stringency and robustness of the assay, various modified arrest peptides were investigated. This included other published arrest peptides from other procaryotes as well as SecM hybrids. In addition, several arginine residues were included upstream of the arrest peptide. It was hypothesized that a positively charged peptide will slow down translation and lead to stronger interactions with the ribosome exit tunnel. In fact, one of these construct with five arginine residues led to an arrest elongation of up to 1.5 hours.
Subsequently, in vitro translation was coupled with mRNA display by using puromycin linked mRNA as the template. Quantification of cDNA by real time PCR revealed that only low display efficiencies could be achieved. However the recovery of the folded protein was ca. 6-fold higher compared to its non-folding counterpart. In fact, comparison with the respective non arresting and abortive (stop codon in arrest peptide) controls showed identical recoveries. This suggested that the screening assay is functional and could be applied for selection experiments.
Due to Covid-19 related delays, the NovoFold project could not progress as anticipated. However it is expected that the results on the arrest peptide engineering and assay development will be published. In addition the achieved results provide a robust basis for further grant applications and continuation of the project.
NovoFold selection assay combines the arrest peptide folding sensor with mRNA display.