Skip to main content

Methods and software for evolutionary analysis of genome sequence data

Objective



Molecular evolutionary studies have traditionally concentrated on single genes or a single group of organisms. The advent of genome projects, and the rapid progress that has been made in sequencing the genomes of model organisms, has created a demand for changes in the way molecular evolutionary analysis is carried out.
First, there is a pressing need for increasingly automated methods of analysis to cope with increased volumes of data (for example, assignment of proteins to super families and reconstruction of phylogenetic trees). Second, there is an opportunity to develop new methods of analysis that exploit the wealth of data on chromosome organization that has recently become available from genome projects (for example, the variation of base composition and gene density along chromosomes). We propose to address these issues by developing new analytical methods and writing new software to implement them. Specifically, we will develop methods for six areas of analysis described below.
1. Ribosomal RNA Database: Development of a database for storing, retrieving, and aligning large sets of homologous rRNA sequences. This database will serve as a model for specialised gene family databases.
2. Gene Family Alignment: Development of algorithms for adding new sequences to existing alignments, with special reference to ribosomal RNA. 3. Chromosome Organization and Compartmentalization: Analysis of gene composition, gene density, and gene orientation as a function of gene location, particularly with respect to chromosome landmarks such as replication origins, centromeres and telomeres.
4. Gene Duplication and Genome Rearrangement: Identification of duplicate genes and duplicate chromosomal regions; use of molecular clocks to estimate the ages of duplication events; methods for converting observed extents of chromosomal rearrangement among species into quantitative measures of evolutionary distance.
5. Compositional Biases: Development of methods to study the effects of nucleotide and amino acid composition biases, and codon usage bias, on the accuracy of phylogenetic tree reconstruction and on the estimation of rates of molecular evolution. Analysis of the effects of bias on the efficiency of gene identification in genome sequence data.
6. Pattern Searching: Screening large datasets of functionally equivalent sequences to identify sequence motifs of functional importance, with special reference to promoter and other non-coding sequences.
Our network comprises six European laboratories that have strong individual track records in silico molecular evolutionary analysis and have complementary interests and experience. All software developed by the network will be written and implemented on UNIX workstations (with, where necessary, X-windows graphical interfaces), which are now the standard platform for the field, and will be made freely available through the European Bioinformatics Institute software distribution archives.

Coordinator

University of Dublin - Trinity College
Address
Trinity College
2 Dublin
Ireland

Participants (5)

UNIVERSITA DEGLI STUDI DI BARI
Italy
Address
Via E. Orabona 4
70125 Bari
UNIVERSITY COLLEGE CORK, NATIONAL UNIVERSITY OF IRELAND, CORK
Ireland
Address
Lee Maltings, Prospect Row
30 Cork
University of Nottingham
United Kingdom
Address
Queens Medical Centre
NG7 2UH Nottingham
Université Claude Bernard Lyon 1
France
Address
43,Boulevard Du 11 Novembre 1918
69622 Villeurbanne
Uppsala University
Sweden
Address
3,Husargatan
751 24 Uppsala