Crucial timing of gene expression
In the tussle between gene expression or not, transcription factors (TFs) are key. Binding to specific regions of DNA, they control the transcription of genetic information from DNA to RNA. TFs use non-coding DNA docking stations called cis-regulatory modules (CRMs) to kick off the gene expression regulation levels. One problem when working with CRMs is that they tend to vary in sequence composition. Although this doesn't affect their role in the cell, it interferes with identification of CRM sequences. Deciphering these codes will enable identification of new drug targets to modulate DNA expression. The CISREGLOGIC project developed computational tools to identify orthologous CRMs, that is sequences derived from a common ancestor. An ingenious method weeded out the desired sequences. Two distant relatives of the sea squirt class were subject to phylogenetic footprinting. These species were far enough apart to identify CRMs on the basis of non-coding alignment. Software then identified candidate CRMs for certain genes in each species and their approximate position in the genome. To identify the TF binding sites, a new affinity scoring system was used. The researchers also devised a method to eliminate those binding sites that registered less than significant TF binding. Tools identified pairs of CRMs that could be considered orthologous. The researchers showed that clusters of similar binding sites are shared between the two species in the area of early regulatory genes. Furthermore, clusters of similar binding sites are associated with genes of similar function. Genomic data from the CISREGLOGIC project has provided a strong knowledge base for further research. Translation of the results into pharmaceutical products may lead to modulation of gene expression regulation that predisposes to a range of diseases.