Skip to main content

Characterization of regulatory genomic regions. Development of databases and sequence analysis tools

Objective



The objectives of the proposed project are to develop and provide tools for the interpretation of genomic DNA sequences with special emphasis on regulatory regions.
Sequencing whole genomes will yield a huge amount of data in near future. Systematic functional analysis of these sequences with the standard experimental methods is far beyond existing laboratory research capacities. Thus, we have to rely on in silico approaches at least for planning of targeted experiments. The assessment of the biological meaning of a gene depends not only on methods to predict the structure and function of its gene product, but also on the ability to derive its regulatory expression pattern from the base sequence. Gene control regions pose particular difficulties to in silico function prediction approaches which may partly explain the modes of success of current algorithms addressing this problem. The elementary sequence signals are typically short (in the range of 5 to 30 base pairs) and highly variable, reflecting the biochemistry of the decoding mechanism. As a consequence of these properties, the physiological significance of an individual control module can only be assessed by a comprehensive search involving context analysis within the (putative) regulatory region combined with gene identification methods based on coding region prediction. The technical approaches suggested in this proposal particularly address the recognized difficulties outlined above.
Thus, there is an obvious need for programs that enable molecular biologists to efficiently annotate newly unravelled sequence data with tentative functional features.
To achieve this, we propose the following programme:
1. Databases that collect experimental relevant data such as TRANSFAC and EPD provide the basis for sequence analysis. They will be appropriately adapted and integrated.
2. The next step will be to develop algorithms that permit the identification of individual functional elements as precise as possible.
3. The third step is a comprehensive context analysis of potential cisregulatory sequence elements within promoters, enhancers or LCRs and in conjunction with coding regions / open reading frames.
4. Finally, all algorithms to be developed have to be continuously verified in close collaboration with experimental researchers using appropriate model genomes.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

GBF - NATIONAL CENTRE FOR BIOTECHNOLOGY
Address
1,Mascheroder Weg 1
38124 Braunschweig
Germany

Participants (5)

GSF-RESEARCH CENTER FOR ENVIRONMENT AND HEALTH
Germany
Address
Ingolstaedter Landstrasse 1
85764 Neuherberg
INSTITUT SUISSE DE RECHERCHES EXPERIMENTALES SUR LE CANCER
Switzerland
Address
155,Ch. Des Boveresses 155
1066 Epalinges
MEDICAL RESEARCH COUNCIL
United Kingdom
Address
Babraham Bioincubator
CB2 4AT Cambridge
NATIONAL RESEARCH COUNCIL OF ITALY
Italy
Address
Via Fratelli Cervi 93
20090 Segrate
UNIVERSITY OF LAUSANE
Switzerland
Address
Batiment De Biologie
1015 Lausanne