Skip to main content

Structure and Dynamics of Low-Complexity Regions in Proteins: The Huntingtin Case

Periodic Reporting for period 3 - chemREPEAT (Structure and Dynamics of Low-Complexity Regions in Proteins: The Huntingtin Case)

Reporting period: 2018-09-01 to 2021-02-28

The main aim of chemREPEAT is the structural characterization of protein huntingtin (htt), the causative agent of Huntington’s Disease (HD), and to understand the structural bases of this pathology. The N-terminal region of Htt, the so-called exon1, contains a homorepeat (HR) region that contains a large number of consecutive glutamine residues. Individuals with more than 35 consecutive glutamines suffer this deadly neurodegenerative pathology. The structural characterization of htt represents an enormous challenge due to its inherent flexibility, which precludes the use of X-ray crystallography, and its repetitive nature that hampers the application of Nuclear Magnetic Resonance (NMR). Our developments aim at surpassing present limitation and shedding light to fundamental aspects of this disease. Our efforts in this period have been centred in developing strategies that will enable us to tackle the challenge and to reach structural/dynamic models of the Htt. by combining experimental and computational approaches.
We can divide our work in two main aspects that are directly related with the experimental (points 1-4) and computational (points 5 and 6) parts that will merge in the future to derive an atomistic picture of the structural bases of the pathological threshold in HD.

1. Residue specific labelling within poly-Q regions: The proof-of-concept
Isotopic labelling, which is necessary to apply NMR, yields htt samples in which the Glutamine peaks appear in a narrow region of the spectra precluding the traditional frequency assignment that is necessary for subsequent structural studies. To disentangle this complexity we have designed a strategy enabling isotopic labelling in a glutamine-specific manner. In this way, NMR spectrum is reduced to a single peak that probes the structure and dynamics of an individual glutamine within the Poly-Q homorepeat. The strategy consists in the combination of cell-free protein expression with tRNA non-sense suppression. In this period we have developed both tools and, by combining them, we proved the validity of the concept. Briefly:
(i) We have adapted a published protocol to produce efficient E.coli lysates and methods to optimize cell-free protein synthesis.
(ii) We can produce and purify properly folded tRNA in large quantities.
(iii) We overexpress and purify an active engineered yeast glutaminyl tRNA synthetase enabling an efficient loading of the glutamine amino acid to the tRNA.
(iv) We have optimized an efficient way to introduce loaded tRNA into cell-free reactions to obtain the residue-specific isotopically labelled htt.
(v) We have adapted the protein construct (fused to a Green-Fluorescent Protein and a His-tag) to monitor and efficiently purify the protein for subsequent NMR studies.
(vi) We have adapted NMR pulse sequences to record in moderate time 15N-H and 13C-H NMR spectra.
These developments are described in a recent publication in which 5 glutamines of a non-pathological version of Htt exon1 (H16) were studied [Urbanek et al. Angewandte Chemie, 2018]. Moreover, parts of this work have been presented in national and international conferences (see Below)

2. Structural investigation of the non-pathological version of Htt exon1 H16.
Using the above described methodology we have addressed the structural and dynamic characterization of a non-pathological version of Htt exon1 containing 16 consecutive glutamines (H16). We scanned the 22 glutamines of the construct (16 from the homorepeat and 6 from the flanking regions) by producing residue-specific labelled samples for which we measured 15N-H and 13C-H NMR spectra to obtain precise backbone and side-chain chemical shifts. Moreover, using standard 3D NMR experiments we assigned the rest of the residues of H16 (with exception of the prolines). The primary analysis of the data indicates the presence of an alpha-helix encompassing the N-terminal region of H16, the so-called N17, and the initial part of the poly-Q homorepeat. The helical propensity smoothly decreases along the homorepeat to become fully disordered in the last residues of the homorepeat. At present we are performing an ensemble model driven by the experimental chemical shifts. In the following weeks we will proceed to write and submit a manuscript describing these results.
This work has been presented already in a local conference, and it will be presented in two others in the following months (see below).

3. Application of pathological versions of Htt.
Unveiling the structural bases of the pathological threshold of HD depends on the capacity to perform the above-mentioned experiments to a pathological version of Htt. This is not straightforward as long Poly-Q tracts are prone to aggregation. We have optimized the protein production and purification of a Htt construct containing 46 consecutive glutamines fused to the GFP and a his-tag (H46). These preliminary results have been reported in the previously mentioned article [Urbanek et al. Angewandte Chemie 2018]. NMR experiments show that Poly-Q flanking regions have very similar structural and dynamic properties in both the pathological and non-pathological versions of Htt. In the following months we will proceed to scan some of the 52 glutamines present in H46 and explore the differences between both versions of the protein.

4. Extension to poly-Proline (Poly-P) homorepeats.
In addition to the Poly-Q homorepeat, htt contains two poly-P homorepeats of 11 and 10 consecutive prolines placed at the C-terminus of exon1, which is normally termed as Proline-Rich Region (PRR). We have started to explore the conformational properties of multiple prolines in that region with special focus on the cis/trans equilibrium that this amino acid presents. To achieve this aim we isotopically label individual prolines to subsequently perform NMR experiments. Unfortunately, we were unable to produce an active prolyl-tRNA synthetase. To overcome this limitation we used the flexizyme strategy, a chemical biology approach that can load any amino acid to a RNA. Briefly:
(i) Synthesis and purification of active flexizyme, a catalytic RNA, by in vitro transcription.
(ii) Our collaborator Prof. Carlos Cativiela (Universidad de Zaragoza) has synthetized 3,5-dinitro benzyl ether (DBE) protected 15N,13C-proline.
(iii) We have optimized a reaction protocol enabling the incorporation of isotopically labelled proline to a tRNA catalysed by the flexizyme. The final yield is similar to that reported in literature (≈40%).
(iv) By adding the Proline-loaded tRNA to the cell-free reaction we have produced several proline specific H16 samples that we submitted to NMR measurements to record the side-chain frequencies reporting on the cis/trans equilibrium population in a position dependent manner.
(v) We have identified the natural abundance NMR signals as a problem to quantify the relative populations of the cis and trans isomers of the studied prolines. We have overcome this problem by performing the cell-free reaction using deuterated d5-proline.
Therefore, at this point we have all the tools to explore for the first time the cis/trans equilibrium within poly-P homorepeats. Our first results suggest that prolines from the poly-P experience a reduced isomeric equilibrium.


5. Application of Hybrid parallelization of a multi-tree path search algorithm to highly-flexible biomolecules.
The study of the conformational energy landscape of biomolecules is essential for the understanding of their physicochemical properties. This requires the exploration of a continuous, high-dimensional space to identify the most probable conformations and the transition paths between the distinct co-existing conformations. The problem is computationally difficult, in particular for highly flexible biomolecules such as Intrinsically Disordered Proteins (IDPs). In recent years, a robotics-inspired algorithm called Transition-based Rapidly-exploring Random Tree (TRRT) has been proposed to solve this problem, and has been shown to provide good results with small and middle-sized biomolecules. Aiming at treating larger biological flexible systems, we propose a strategy for the efficient parallelization of a multi-tree variant of TRRT, called Multi-TRRT, enabling its execution in large computer clusters. The parallel algorithm uses OpenMP multi-threading for computation inside each multi-core processor and MPI to perform the communication between processors. Such a hybrid parallelization strategy clearly outperforms previous fully-distributed implementations of RRT-like algorithms, significantly reducing communication overhead and memory requirements. This implementation is also very flexible, since the algorithms can be run on a single multi-core processor (without communication requirements) or on a large computer cluster without any modification in the code. The adaptive space subdivision approach is a key component of the proposed parallelization strategy. It drastically reduces the computational cost associated with inter-processor communication and nearest neighbour search.
We developed the approach using several disordered proteins and peptides with sizes spanning from 5 to 80 residues. The analysis shows that the performance gain provided by the parallel algorithm depends on the size of the molecule. For relatively small systems such as peptides of up to 10-15 residues, using multiple threads in a single computer to collaboratively construct the exploration trees is probably the best choice. In this case, the performance gain provided by a larger computer cluster is not significant. However, for larger systems, the speed-up obtained by the hybrid approach increases almost linearly with the number of processors, even when this number is large. Indeed, these encouraging results demonstrate the applicability of the parallelized Multi-TRRT algorithm to characterize the conformational landscape of large disordered proteins. These results are reported in a manuscript [Estaña et al.] that is under revision at this moment. Unfortunately, the journal has turned out to be extremely slow in terms of processing the manuscript as we submitted in 01/2017 and they are right now evaluating the revised version. Moreover, this study has been presented in several conferences (see below)

6. Accurate structural ensembles of Intrinsically Disordered Proteins (IDPs).
Generating conformational models of IDPs is challenging due to their inherent flexibility and the large conformational sampling that they experience. Moreover, the vast majority of IDPs contain partially structured regions that are inserted into fully disordered chain, and that are partner recognition motifs. Present computational approaches have difficulties to identify these motifs and to anticipate their structure class and relative population. Normally, this can be partially alleviated by integrating experimental data (NMR and/or SAXS) into these computational approaches. However, these data are not always easy to obtain.
We have proposed a strategy that exploits the structural information encoded in a large coil database of tripeptide fragments derived from high-resolution structures to built accurate models of IDPs. Using NMR Residual Dipolar Couplings and SAXS data for multiple proteins, we show that only accounting for the amino acid type is enough to nicely describe conformations of fully disordered regions of IDPs. Conversely, the structural features encoded in the tripeptide database are necessary to model partially structured regions (α-helices, β-strands and turns) inserted in IDPs. The combination of both strategies affords accurate conformational ensembles of IDPs simultaneously describing regions with distinct secondary structures as well as their relative populations. The strategy that we have developed has the capacity to predict the structural features of disordered chains from the amino acid sequence. Therefore, it opens structural bioinformatics to disordered proteins, and the possibility to anticipate structure/function perturbations exerted by potentially pathological point mutations, insertions or deletions. Moreover, this predictive capacity paves the way to extend protein design to disordered chains. The results of this study, which has been done in collaboration with Dr. Juan Cortés (LAAS-CNRS, Toulouse), have been submitted to Structure, and have been presented in several conferences (see below).
LCRs have remained out of the reach of structural biology methods. The methodologies that we are developing are surpassing this barrier and will enable a detailed understanding of the structural bases of biological and pathological phenomena involving LCRs. Huntingtin, the subject of our study, is arguably the most notable example of the connection between LCRs and pathology. These are the main achievements and how we will exploit them until the end of the project:

- Methods enabling the site-specific isotopic labelling will provide for the first time the clues of the structural/dynamic bases of Huntington’s disease. As initially planned, we will do this by systematically studying Htt versions below (already done), above (H46) and in the pathological threshold (H35).

- The developments performed to site-specifically label glutamine and proline have prompted us to span the panel of natural amino acids amenable to this kind of incorporation. We expect these developments that will be performed in the second part of the project to be a breakthrough in structural biology as multiple biologically relevant systems will be addressable for structural biologists.

- Along the project we have developed efficient cell-free methods for the in vitro synthesis of Htt variants. Now we want to exploit this achievement to produce Htt versions with different deuteration schemes (Gln, Pro, Gln+Pro…) to obtain domain specific structural information using Small-Angle Neutron Scattering (SANS). I have contacted experts at the Institute Laue-Langevin (ILL-Grenoble) to co-supervise a PhD student starting in 2019.

One of the main problems in the characterization of IDPs is the need for extensive experimental data to derive accurate structural models, and the inability of purely theoretical methods to achieve this aim. We have developed an approach based in a coil library that overcomes previous limitations and enables the construction of accurate ensemble models of IDPs embedding partially structured regions. In the future we will use this tool for several objectives:

- A server will be built and made available to all the community for the generation of accurate ensembles of IDPs from protein sequences.

- The ensembles built with this approach will be used to structurally characterize Htt by integrating the experimental data that will be produced along the project.

- Structural models built with this approach will be used as starting points to build conformation transition pathways with the robotics-inspired algorithm (multi-TRRT) that we are developing with Juan Cortés (LAAS-Toulouse).