Objective
Mass sequencing of complete genomes opened a new era in computational biology. The rate of sequencing greatly exceeds the possibilities for experimental characterization of genomes. Existing tools allow the community to perform preliminary annotation of genomic sequences, but there exist a large number of yet unsolved problems. The aim of the project is to improve existing methods for genome annotation and to develop new ones, to implement these methods in software, and to apply this software for analysis of new genomic sequences, protein families and regulatory systems producing testable predictions and adding to the body of biological knowledge. Improvement in reliability and specificity of predictions will be achieved by integration of approaches developed in different areas of computational molecular biology.
The specific objects of this proposal are:
To improve algorithms for gene recognition in prokaryotes, in particular, to enhance reliability of gene starts and to develop comparative procedures for recognition of short prokaryotic genes;
To develop algorithms for gene recognition in eukaryotes by comparison of syntenic genomic regions. To develop comparative algorithms for recognition of genes in sequences with errors by DNA-protein and DNA-EST spliced alignment;
To develop methods for identification of distant protein homologues based on integration of structural and similarity predictions and algorithms for identification of homologues without prior choice of gap weights. To apply the developed techniques for functional and structural annotation of genomes;
To improve fold recognition algorithms using self-consistent threading of multiple protein alignments. To develop and apply software for large-scale fold recognition in proteomes;
To continue development of techniques for prediction of regulatory patterns in complete genomes;
To apply them for analysis of interacting circuits and global regulatory systems;
To develop software for large-scale semi-automated analysis of regulation and to apply it to bacterial, archaeal, and fungal genomes;
To develop algorithms for compositional segmentation of chromosome-size DNA sequences and to apply them for analysis of prokaryote and protist chromosomes;
To develop and apply algorithms for analysis of the repeat structure of DNA;
To develop methods for analysis of contrast degenerate patterns and to apply them for prediction of new restriction-modification systems.
The following software products will be made available to the academic community as the outcome of the project:
New versions of Orpheus (gene recognition in prokaryotes).
Programs for similarity-based gene recognition in genomic sequences with errors; for prediction of exon-intron structure by comparison of genomic sequences; for gene recognition by comparison with protein profiles; for error correction by genome-EST comparisons.
Algorithm for construction of profile-sequence alignments with all reasonable numbers of gaps in the aligned sequences.
Robust procedure for extracting profiles from multiple alignments taking into account varying level of evolutionary relationships between aligned sequences and position-specific evolution rates.
Software for detection of distant homologues in protein databases.
New version of SCF_THREADER (threading of multiple sequence alignments).
New version of GRAPPE (search for maximal repetitions and gapped alignments).
Topic(s)
Call for proposal
Data not availableFunding Scheme
Data not availableCoordinator
69012 Heidelberg
Germany