Final Report Summary - HOMEOREG (Genomes and Evolution: Systematic Comparative Analysis of Homeobox Genes and their Non-coding Regulators)
In the project, we had carried out a systematic analysis of homeobox genes and their non-coding regulators throughout the major branches of animal evolution, including species: human, mouse, chicken, frog, zebrafish, amphioxus, nematode, fruitfly, beetle, honeybee and nematode. We met the expected outputs designed by the project. One research paper and one resource paper were published. A database -- HomeoDB2 -- was set up to provides a resource for studying the dynamics of homeobox gene evolution, and is freely accessible at Figure 1
Figure 1. Logo of HomeoDB2.
With the recent advances in genome sequencing, we had used those genomes data to look into the differences of homeobox genes between species, and further more, tried to trace the origins of homeobox genes. All of these data generated from the project is collected and presented in a web-based database: Homeobox gene database (HomeoDB2). The database was up-to-date continually from May 2008 to July 2011, currently included human, mouse, chicken, zebrafish, frog (Xenopus tropicalis), amphioxus (Branchiostoma floridae), fruitfly (Drosophila melanogaster), beetle (Tribolium castaneum), honeybee (Apis mellifera) and a nematode (Caenorhabditis elegans). HomeoDB2 now contains 1929 homeobox loci, comprising 1763 probable genes, 125 probable pseudogenes and 41 loci with undefined annotation, classified into 127 gene families in 11 classes, plus several unclassified groups. At the request of the HUGO Gene Nomenclature Committee (HGNC, http://www.genenames.org/) each human gene page has also been bidirectionally cross-linked to HGNC gene symbol report.
To increase functionality in HomeoDB2, we have incorporated several new features. First, we have integrated a standalone BLAST server. Users can input a nucleic acid or amino acid sequence as a query and use blastx or blastp to search the HomeoDB2 dataset. The advantages over searching complete genomic or RefSeq databases are speed and immediate access to gene family data. We recommend doing BLAST only as an initial clue to a gene's identity and following this with both molecular phylogenetics and analysis of synteny. Second, HomeoDB2 includes a 'Compare' function which allows users to select any species in the database to compare, either across all homeobox genes or subsets of the gene classification. This allows users to home in on gene duplications and gene losses for further study. Third, HomeoDB2 includes a 'Download' function so that users can download homeodomain sequences from any species, classes or families (or combinations) in FASTA format. All these functions make data accession and reuse much easier for both research communities and general public.
The core role of HomeoDB2 remains the classification of coding genes, but we have also initiated HomeoReg as a sub-dataset of HomeoDB2. This aims to collect information on experimentally-demonstrated regulatory interactions between non-coding RNA molecules (e.g. miRNA) and homeobox genes. Only those relevant to homeobox genes in HomeoDB2 are included; these comprise 46 interactions at present. These regulators are displayed with their target homeobox genes showing hybridization information.
Ultimately, it would be ideal to include in HomeoDB2 all species for which complete genome sequence data are available. Quality of sequence and assembly is not uniform, however, which makes comprehensive identification and annotation of homeobox sequences impossible for many genomes. By focusing in animal models with well-assembled and annotated genomes, HomeoDB2 provides a resource suitable for investigating the dynamics of homeobox gene evolution along major evolutionary lineages. For example, we have exploited HomeoDB2 for a comprehensive comparison between human and mouse genomes, revealing far more homeobox gene loss in the rodent evolutionary lineage than in the primate lineage. In this research, we find there has been much more homeobox gene loss in the rodent evolutionary lineage than in the primate lineage. While humans have lost only the Msx3 gene, mice have lost VENTX, ARGFX, LOC647589, DPRX, SHOX, RAX2, NANOGB, LEUTX and TPRX1. This analysis provides insight into the patterns of homeobox gene evolution in the mammals, and a step towards relating genomic evolution to phenotypic evolution (Figure 2).
Figure 2: A summary of homeobox gene dynamics in the mouse and human evolutionary lineages. The majority of homeobox genes are conserved between mouse and human lineages (grey squares), although some have undergone duplication to different extents (cascaded boxes). Humans have lost the Msx3 gene; mice have lost Leutx, Nanognb, Tprx1, LOC647589, Rax2, Shox, Dprx, Argfx and Ventx (dashed boxes). Three new homeobox loci (Gm7235, Gm5585 and Crxos1) and one new cluster (Obox) arose in the rodent lineage.
Resolving the complete patterns of gene gain and loss across the animal kingdom is an important goal for comparative genomics, and is relevant to any attempt to relate genome evolution to phenotypic evolution. With regard to mammals, the project described here starts this line of enquiry for the homeobox genes, and uncovers an unexpected difference in the extent of gene loss between two evolutionary lineages (human and mouse). As more genomes are sequenced to high coverage, assembled and annotated, it is hoped that further such studies will uncover the patterns and processes underlining genome evolution in this important and diverse taxon.
Publications:
Zhong, Y.F. and Holland, P.W.H. (2011) HomeoDB2: functional expansion of a comparative homeobox gene database for evolutionary developmental biology. Evolution and Development. 13. In press.
Zhong, Y.F. and Holland, P.W.H. (2011) The dynamics of vertebrate homeobox gene evolution: gain and loss of genes in mouse and human lineages. BMC Evolutionary Biology 11: 169 & 11: 204