Final Report Summary - BIOSEQANALYSIS (Computational methods for biological sequence analysis with application to evolution of yeast mitochondrial genomes)
The project had three main scientific goals:
(a) probabilistic modeling of function evolution,
(b) comparative analysis of yeast mitochondrial genomes, and
(c) design of efficient bioinformatics algorithms. Additionally, we aimed to establish a bioinformatics research group at Comenius University, build collaboration with life science researchers, and introduce students to interdisciplinary research in bioinformatics.
Advances in DNA sequencing allow scientists to sequence whole genomes at relatively low costs. One of the goals of such sequencing is to compare genomes of different organism to reveal their differences, and study evolutionary and functional implications of these differences. To address goal
(a), we have concentrated on hidden Markov models (HMMs) and their extensions, which form the basis of many computational tools for comparative genomics. One of the recent trends is to explore different inference criteria for use with HMMs. We have proved that some new inference criteria lead to computationally hard problems (Nanasi et al. 2013), while others can be efficiently implemented (Nanasi et al. 2010). The result was an improved tool for aligning a sequence to a special form of profile HMMs used for recombination detection in HIV viruses. Our algorithm can predict individual recombinant regions with greater sensitivity and specificity than previous approaches.
We have also developed new tools for aligning sequences from different species, concentrating on challenging sequences with many duplicated or repeated regions. Such sequences are difficult to aligned satisfactorily by existing methods. Sequence alignment is traditionally the first step in almost any comparative analysis and its quality is very important for any further studies of function and evolution. One of our tools aligns sequences with tandemly repeated motifs (Kovac et al. 2012), incorporating probabilistic model of the motif itself. For even more challenging regions, resulting from multiple overlapping duplications, deletions and rearrangements, we have proposed a method for segmenting a sequence into shorter atomic segments (Brejova et al. 2011). These segments do not have any internal repeats and can be subsequently analyzed by conventional sequence analysis tools. Currently, we work on extending our tools so that they work not only of preselected complex regions, but also on a whole-genome scale.
Within this project (goal B), we have also started a collaboration with Prof. ~Nosek from the Faculty of Natural Sciences at Comenius University. This collaboration lead to comparative analysis of eight newly sequenced mitochondrial genomes of Candida species (Valach et al. 2011). These genomes, although related, exhibit remarkable variability in their architecture. Our results suggest possible evolutionary scenarios leading to linearisation of chromosomes in some of these species. Subsequently, we have collaborated on sequencing and assembling 38 additional yeast mitochondrial genomes and five nuclear genomes. The analysis of this data is on-going. Preliminary results were presented at conferences, and we expect several journal publications within the next 2 years. Our group performs various bioinformatics analyses using existing tools (genome assembly, alignment, phylogenetic analysis, gene finding etc.), but we have also identified and solved new interesting computational problems related to genome rearrangement, which were not addressed by existing tools (Kovac et al. 2011).
Finally, we have studied several other computational problems in bioinformatics (goal C). One area of our interest is gene finding, that is, annotation of protein coding genes in genomic DNA. Accuracy of gene finding can be increased by the use of additional experimentally derived information. We have studied several formalisms for expressing such additional information and provided algorithms or computational complexity results for gene finding in those settings (Kovac et al. 2009, Kucharik et al. 2011). We have also extended our gene finding tool ExonHunter and applied it to annotation of several genomes of pathogenic fungi, in collaboration with Dr. Zhou from Fudan University in Shanghai.
Within goal C, we have also implemented a practical tool RNArobo for finding distant homologs of known RNA genes using descriptors capturing both sequence and structure constraints. Our algorithm adjusts the search strategy during the search to adapt to observed properties of the particular search task, and thus is able to search efficiently even for complex patterns (Rampasek et al. 2013). Our tool resulted from a collaboration with Prof. Luptak from University of California in Irvine, who uses it in his research to find new ribozymes, RNA molecules with enzymatic functions.
Overall, we have published 16 papers on our results in scientific papers and at refereed conferences. Our results provide new algorithmic techniques and software tools for analysis of genomic data. We focus on core areas of genome analysis, including alignment, gene finding and comparative analysis. With genome sequencing progressing at an increasing speed, such tools are necessary for effective use of acquired data.
In addition to the research objectives, we have successfully established a bioinformatics research group at Comenius University, which currently comprises two PIs, seven PhD. students, seven Master students and one Bachelor student. We teach bioinformatics courses, organise seminars and summer schools to allow students from biology and computer science to interact and gain experience in interdisciplinary topics. We collaborate with local life science researchers, providing them with bioinformatics expertise necessary for contemporary biology research. We also maintain active international collaborations and contribute to large international genome sequencing consortia.
Project website: http://compbio. fmph. uniba. sk/