The objectives of the proposed project are to develop and provide tools for the interpretation of genomic DNA sequences with special emphasis on regulatory regions.
Sequencing whole genomes will yield a huge amount of data in near future. Systematic functional analysis of these sequences with the standard experimental methods is far beyond existing laboratory research capacities. Thus, we have to rely on in silico approaches at least for planning of targeted experiments. The assessment of the biological meaning of a gene depends not only on methods to predict the structure and function of its gene product, but also on the ability to derive its regulatory expression pattern from the base sequence. Gene control regions pose particular difficulties to in silico function prediction approaches which may partly explain the modes of success of current algorithms addressing this problem. The elementary sequence signals are typically short (in the range of 5 to 30 base pairs) and highly variable, reflecting the biochemistry of the decoding mechanism. As a consequence of these properties, the physiological significance of an individual control module can only be assessed by a comprehensive search involving context analysis within the (putative) regulatory region combined with gene identification methods based on coding region prediction. The technical approaches suggested in this proposal particularly address the recognized difficulties outlined above.
Thus, there is an obvious need for programs that enable molecular biologists to efficiently annotate newly unravelled sequence data with tentative functional features.
To achieve this, we propose the following programme:
1. Databases that collect experimental relevant data such as TRANSFAC and EPD provide the basis for sequence analysis. They will be appropriately adapted and integrated.
2. The next step will be to develop algorithms that permit the identification of individual functional elements as precise as possible.
3. The third step is a comprehensive context analysis of potential cisregulatory sequence elements within promoters, enhancers or LCRs and in conjunction with coding regions / open reading frames.
4. Finally, all algorithms to be developed have to be continuously verified in close collaboration with experimental researchers using appropriate model genomes.
Funding SchemeCSC - Cost-sharing contracts