Periodic Reporting for period 3 - HD-DittoGraph (HD-DittoGraph: a digital human Embryonic Stem Cell platform for Huntington's repeats) Reporting period: 2021-03-01 to 2022-08-31 Summary of the context and overall objectives of the project Huntington’s disease (HD) is an inherited autosomal dominant neurodegenerative disorder characterized by motor, psychiatric and cognitive dysfunction. HD symptoms appear in mid-life, with irreversible progression over 10-25 years. The disease is characterized by focused neuronal vulnerability with predominant loss of striatal and cortical neurons. The disease is caused by an expansion of the CAG repeat in the exon 1 of the huntingtin gene (HTT), which leads the production of mutant HTT protein (muHTT) with an elongated polyglutamine (polyQ) stretch (Cattaneo et al., Nat Rev Neurosci. 2005; Zuccato et al., Physiol Rev. 2010). There is an inverse correlation between the CAGs lengths and the age at the disease onset, i.e. the longer the CAG, the earliest the symptoms. Individuals with 35 or fewer repeats do not develop the disease, whereas those with 40 or more repeats (fully penetrant alleles) are invariably affected (Hauck B., 2003). In humans, the CAG tract in HTT is highly unstable and prone to expansion, especially during paternal transmission (Merritt, 1969). Pathological repeats can expand further during development to generate a mosaic of cells with differing repeat lengths. In fact, increases in the number of CAG repeats have been observed in mitotically dividing cells throughout the lifetime of HD individuals (McMurray, 2010). Analysis of post- mortem brain tissues from HD patients has revealed high mosaicism in CAG size and very large expansion also in non-proliferating tissues such as the striatum and cortex (Telenius, 1994; De Rooij, 1995; Wheeler, 1999). This phenomenon has been described also in HD models. The expansion of the CAG tract has been well documented in brain tissue from HD mice (Mollersen, 2010; Gonitel, 2008; Larson, 2015). In particular, current evidence shows that the striatal neurons are characterized by a high rate of CAG repeat allele instability compared to other neuronal types, suggesting that functional polymorphisms can be produced in adult neurons. This aspect can also be an intriguing explanation for age-of-onset variability observed between individuals carrying similar germinal CAG length (Larson, 2015). Altogether, this evidence indicates that CAG expansions occur in post-mitotic neurons and may continue during lifetime of the individual and contribute to exacerbate neuronal toxicity and selective neuronal degeneration. The fact that significant CAG repeats length gains occur in non-replicating cells also argues that processes such as inappropriate mismatch repair rather than DNA replication are involved in generating somatic mutations in brain tissue (Shelbourne, 2007). Somatic repeat variation may also occur during the many billions of mitoses characterizing normal brain development (Nithianantharajah, 2007).CAG instability occurs also in long-term cultured fibroblasts from HD mice already after 11 passages (Manley, 1999), in cultured human astrocytes (Farrell and Lahue, 2006), and in neurons derived from pluripotent stem cells (Niclis, 2009).This project aims to identify the genetic factors that are implicated in HTT CAG instability both during mitotic cell replication and in post-mitotic neurons. We will employ an unbiased discovery process that relies on a new human embryonic stem (hES) cell-based platform, barcoded DNA libraries, CRISPR technologies and long-read third generation DNA sequencing. Factors modulating CAG elongation have been divided into cis-acting elements (i.e. DNA sequences in the proximity of the repeat or the repeat itself) and trans-acting elements (i. e. other genes non-proximal to HTT) whose interaction with the repeat contributes to its instability (Richards, 1994). The identification of genetic elements that may contribute to the CAG instability is of pivotal importance, considering the large effect they represent in the disease manifestation, and would expand the pool of available therapeutic targets with the aim of mitigating CAG expansion and therefore, disease symptoms. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far The main achievement reached from the beginning of the project is the generation of the hESc platform which is fundamental for the screening of novel cis and trans modifiers of CAG instability. Our platform is based on CRISPR/Cas9 technology combined with RMCE cassettes surrounding the human HTT exon 1 where the CAG repeat region resides. This cassette allows for ease modification of the entire HTT exon 1 and surrounding regions so that we can manipulate them and assess the effect on CAG instability. So far, we have obtained cell lines with different CAG length that can be used for the identification of new cis and trans modifiers. The power of our platform is that we can potentially exchange our RMCE cassette with unlimited DNA variants containing all the modification of the exon1 of HTT that we want to investigate. Thanks to third generation Nanopore sequencing that we have implemented and validated for accurate CAG sizing, we can track and analyze all the variants of HTT exon1 and their effect on CAG size. The flexibility of our hESc platform give us the possibility to work either in pluripotency status, addressing mitotic instability or under differentiation condition, such as neurons or other cell type where we can investigate DNA repair related instability. Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) A growing body of evidences is suggesting a different interpretation of HD onset and progression. Recent works (Lee, 2019; Wright, 2019) have demonstrated that is the length of uninterrupted CAG in the genome rather than the number of total glutamines in the HTT protein to better correlates with the disease motor onset. This may suggest that a DNA/RNA-based mechanism rather than a protein function could contribute to the disease manifestation supporting the indications obtained in genetic screenings where genes involved in DNA mismatch repair have emerged as strong contributors to disease onset (GeM-HD Consortium, 2015). In this context, somatic instability of the HTT CAG tract may contribute to the identified tissues mosaicism characterized by highly expanded CAG repeats in some brain regions that are also the most affected.Our hESc platform will allow the investigation of the genetic elements contributing to this phenomenon both during somatic expansion and in non-dividing mature neurons, anticipating the need of animal studies. To date, we have defined and established the genetic features necessary to achieve this and we have successfully engineered hESc to carry these features. Moreover, we have defined a new sequencing approach based on an innovative long-reads technology (Oxford Nanopore) that allows better and precise counting of CAG elements and therefore their variations when occur under controlled manipulation of the surrounding genetic elements. This system will also allow the identification of transcripts that affect the CAG size and stability.Therefore, we expect by the end of the project to establish a comprehensive cell platform capable of detecting and monito CAG size at the endogenous locus of HTT in hESc to use it to understand what genetic elements affect its stability therefore, identifying new potential therapeutic targets for HD treatment.