Skip to main content

The biological role of tandem repeats as hypervariable modules in genomes

Final Report Summary - REPEATSASMUTATORS (The biological role of tandem repeats as hypervariable modules in genomes)

Tandem repeats are instable stretches of DNA sequences that are repeated head-to-tail. While they are often believed to be nonfunctional “junk DNA”, tandem repeats often occur within coding and regulatory regions, where their variation could have phenotypic consequences. Repeats are known to be highly instable, with the number of repeated sequences changing at frequencies that are order of magnitudes larger than the typical mutations rates in the genome. Moreover, extreme expansions of some repeats, especially repeats encoding glutamine-rich regions in regulatory proteins and repeats located in promoters of specific genes, have been linked to (human) diseases.

Despite their role in disease, the possible (beneficial) biological role of repeats remains unknown. In this project, we investigated the possible biological function of tandem repeats and tandem repeat variation. We have mapped and analyzed all repeats in the human genome, as well as the genomes of several model organisms (including a viral genome, yeast, fruit fly and a monkey genome). The results indicated that as many as 10 to 20% of all promoters and a similar percentage of all coding regions in these genomes contain repeats. Further analyses showed that repeats are mostly found in promoters regulating inducible genes, and in the coding regions of genes encoding extracellular and regulatory proteins. Experiments focusing on typical examples of such repeats in the model eukaryote Saccharomyces cerevisiae indicated that variation in repeats located within promoters affects the expression of the respective downstream gene, likely because the repeat helps to generate a nucleosome-free region that facilitates gene activation. As such, repeats form a previously underappreciated functional and evolvable promoter element that promotes expression and also makes gene expression highly evolvable.

A second series of experiments focused on glutamine-rich repeats in the coding region of regulatory genes. Our results indicate that variable glutamine-rich repeats form a functional domain without which the regulatory proteins do not function properly. Natural variations in the repeat length cause variation in the activity of the regulatory proteins, which in turn leads to changes in the expression levels of the genes that are regulated by the repeat-containing regulator. Over-expansion of the repeat leads to aggregation of the proteins that harbor the repeat sequence, as well as to changes in the interactome of these proteins, which helps explaining why such overexpansions can lead to disease.

Together, our results suggest that repeats are functional domains in promoters and proteins that may act as instable “tuning knobs” that allow quick evolution of the activity of promoters and regulators. Over-expansions of repeats may lead to disease because of sub-optimal expression levels of downstream genes, but also because the expanded repeats cause aggregation and improper protein interactions. These findings reveal a potential beneficial functional role for tandem repeats. Moreover, the expertise, techniques and mutants generated in this project may also open new avenues to set up models of repeat-associated diseases, and potentially even perform high-throughput screenings to find new pharmaceuticals to fight repeat-associated diseases.