Final Report Summary - EFFICIENT-C-TREE (Efficient Phylogenetics and the Caecilian Tree of Life)
The EFFICIENT-C-TREE project dealt with three major tasks:
1) Building of a complete supermatrix of caecilians representing all the diversity of groups using both nuclear and mitochondrial genes as well as morphology.
2) Performing phylogenetic analysis of the supermatrix and subsets of it, and reconstruct supertrees from the mentioned subsets.
3) Developing and evaluating a range of strategies that will allow significantly more efficient sampling in either the supermatrix and/or supertree analyses to achieve global phylogenetic results, and ultimately the most efficient reconstruction of the caecilian tree of life.
- Seventy caecilians representing the vast majority of the major lineages of Gymnophiona were selected for analyses based on previous and ongoing studies, as well as availability of suitable material for DNA sequencing. Collection of some target taxa was undertaken during a fieldtrip to French Guiana. Twelve markers (mitochondrial atp6, cox1, cytb, nad1, 12S, 16S; nuclear bdnf, h3f3b, rag1, slc8a1, 28S, rho intron 1) were selected, and PCR primers were designed (when not available from previous studies) for amplification and sequencing. Morphological data were collected from a diversity of anatomical systems of caecilians in order to generate phylogenetically informative data for the analyses integrating morphology and molecules.
- Supermatrix analyses of the multilocus dataset were conducted using the most up-to-date methods and software available of phylogenetic inference (mainly maximum likelihood and Bayesian inference with a range of partitioning schemes). Phylogenetic trees of each subset (marker) of the multilocus dataset were also inferred. Supertrees were constructed from the phylogenetic trees yielded by each subset using a broad range of supertree methods, including matrix representation parsimony (MRP), average consensus, split fit, quartet joining, PhySIC, and SuperFine. The theoretical aspects of the development of alternative sampling strategies and experimental protocols for their evaluation were explored based on the study of Wilkinson (1995). Phylogenetic information predictions based on Goldman-type experimental design were validated using newly generated mitogenomic and rag1 data.
The most significant findings included:
1. This is the first time that supertree and supermatrix approaches have been compared in order to investigate experimental design strategies intended to maximise the efficiency of large phylogeny reconstruction. Both approaches appear adequate, but both too are strongly dependent on an adequate (effective) overlap of the data (either sequences or trees) to perform accurately.
2. The multilocus dataset recovers phylogenetic relationships of caecilian amphibians with strong statistical support (in the supermatrix approach). Separate subsets (genes) have varying degrees of support for some relationships, which compromises the phylogenetic accuracy of supertrees derived from them. In this respect, supermatrix approaches outperform supertree analyses.
3. A range of strategies to effectively identify areas of poor overlap in large phylogenies has been developed and tested. Perhaps unsurprisingly, random sampling performs poorly compared to phylogenetically informed subsampling.
4. Supermatrix analyses of the multilocus dataset have yielded the largest, most comprehensive and robust phylogeny of caecilian amphibians thus far: the first substantial 'Tree of life' of caecilians.
5. Goldman-type predictions of phylogenetic information have been validated, and our results corroborate that, although still underexplored, Goldman's method offers a powerful tool for experimental design in molecular phylogenetic studies. However, there are still several drawbacks to overcome, and further assessment of the method is needed in order to make it better understood, and more versatile and accessible.
- The strategies developed in this project will have broad applicability and will produce real savings in time and cost that will speed up the development of ever more complete large-scale phylogenies and better foundations for all branches of comparative biology. Beyond the systematics and supertree theory, this research has also contributed to the state-of-the-art of caecilian systematics by providing the most comprehensive phylogeny of this order of vertebrates to date. The caecilian tree reconstructed here will provide the framework for many future comparative studies ranging from anatomy and behaviour to molecular evolution and genomics.
1) Building of a complete supermatrix of caecilians representing all the diversity of groups using both nuclear and mitochondrial genes as well as morphology.
2) Performing phylogenetic analysis of the supermatrix and subsets of it, and reconstruct supertrees from the mentioned subsets.
3) Developing and evaluating a range of strategies that will allow significantly more efficient sampling in either the supermatrix and/or supertree analyses to achieve global phylogenetic results, and ultimately the most efficient reconstruction of the caecilian tree of life.
- Seventy caecilians representing the vast majority of the major lineages of Gymnophiona were selected for analyses based on previous and ongoing studies, as well as availability of suitable material for DNA sequencing. Collection of some target taxa was undertaken during a fieldtrip to French Guiana. Twelve markers (mitochondrial atp6, cox1, cytb, nad1, 12S, 16S; nuclear bdnf, h3f3b, rag1, slc8a1, 28S, rho intron 1) were selected, and PCR primers were designed (when not available from previous studies) for amplification and sequencing. Morphological data were collected from a diversity of anatomical systems of caecilians in order to generate phylogenetically informative data for the analyses integrating morphology and molecules.
- Supermatrix analyses of the multilocus dataset were conducted using the most up-to-date methods and software available of phylogenetic inference (mainly maximum likelihood and Bayesian inference with a range of partitioning schemes). Phylogenetic trees of each subset (marker) of the multilocus dataset were also inferred. Supertrees were constructed from the phylogenetic trees yielded by each subset using a broad range of supertree methods, including matrix representation parsimony (MRP), average consensus, split fit, quartet joining, PhySIC, and SuperFine. The theoretical aspects of the development of alternative sampling strategies and experimental protocols for their evaluation were explored based on the study of Wilkinson (1995). Phylogenetic information predictions based on Goldman-type experimental design were validated using newly generated mitogenomic and rag1 data.
The most significant findings included:
1. This is the first time that supertree and supermatrix approaches have been compared in order to investigate experimental design strategies intended to maximise the efficiency of large phylogeny reconstruction. Both approaches appear adequate, but both too are strongly dependent on an adequate (effective) overlap of the data (either sequences or trees) to perform accurately.
2. The multilocus dataset recovers phylogenetic relationships of caecilian amphibians with strong statistical support (in the supermatrix approach). Separate subsets (genes) have varying degrees of support for some relationships, which compromises the phylogenetic accuracy of supertrees derived from them. In this respect, supermatrix approaches outperform supertree analyses.
3. A range of strategies to effectively identify areas of poor overlap in large phylogenies has been developed and tested. Perhaps unsurprisingly, random sampling performs poorly compared to phylogenetically informed subsampling.
4. Supermatrix analyses of the multilocus dataset have yielded the largest, most comprehensive and robust phylogeny of caecilian amphibians thus far: the first substantial 'Tree of life' of caecilians.
5. Goldman-type predictions of phylogenetic information have been validated, and our results corroborate that, although still underexplored, Goldman's method offers a powerful tool for experimental design in molecular phylogenetic studies. However, there are still several drawbacks to overcome, and further assessment of the method is needed in order to make it better understood, and more versatile and accessible.
- The strategies developed in this project will have broad applicability and will produce real savings in time and cost that will speed up the development of ever more complete large-scale phylogenies and better foundations for all branches of comparative biology. Beyond the systematics and supertree theory, this research has also contributed to the state-of-the-art of caecilian systematics by providing the most comprehensive phylogeny of this order of vertebrates to date. The caecilian tree reconstructed here will provide the framework for many future comparative studies ranging from anatomy and behaviour to molecular evolution and genomics.