Final Activity Report Summary - StatInfPopGen (Statistical inference in population genetics)
Although the abovementioned theory was not new, experimental population geneticists, who are now able to gather lots of molecular marker data, needed methods for their efficient use to tackle questions about the history of populations. By the term efficient use, we imply that quantitative answers must be given, such as the relative probability of two or more possible histories or the time at which two populations diverged. To get this type of answers, a statistical approach had to be developed.
In this project, we focussed on two specific approaches, through the combination of three models for population history, gene evolution, i.e. mutation model, and gene genealogy in populations, i.e. the coalescent model. As in many fields of science, the complexity of problems prevented us from using exact solutions; hence both approaches were based on computer intensive stochastic simulations. In the first one, gene genealogies, compatible with the data and the population history, were simulated in order to estimate the likelihood of the data, i.e. the probability of the data given the models. In the second approach, the data themselves were simulated and the inference was drawn from their amount of similarity with the observed data. These two approaches were developed for simplistic population histories involving only one or two populations. Our main contribution via this project was to develop solutions and software for dealing with more complex, and hence more realistic, histories. As an example, our software was used to choose among seven possible histories of African pigmies based on genetic data from 21 populations.