This project explored the history of genomic science across three different species: the baker’s yeast S. cerevisiae, the pig S. scrofa and H. sapiens. Each species featured a concerted project to determine the sequence of its DNA molecule – its genome – at consecutive, yet overlapping time periods: 1989 to 1996 in yeast; 1990 to 2003 in human, and 2006 to 2012 in pig. By tackling the antecedents, development and consequences of these projects, as well as their interactions and surrounding environments, we uncovered a variety of practices and modes of organisation, all of them constitutive of the scientific field of genomics.
Uncovering the diversity of genomics is important because the only widely known episode of its history is the Human Genome Project. Existing and amply publicised accounts of this project have mobilised a success narrative of rapid compilation and unrestricted release of the DNA sequence data characteristic of the human genome. Yet according to these same accounts, there has been an ongoing translational gap between this large amount of publicly available data, and the medical and scientific goals to be fostered by the human genome sequence.
Existing evidence suggested that the narrative of the Human Genome Project sidelined several aspects of human genomics, as well as genomics research conducted in other species. Our project reconstructed these overlooked lineages in the history of genomics through a mixed methods approach that used quantitative data as an input for historical research. Our quantitative data comprised details of almost 13.5 million of yeast, human and pig DNA sequences submitted to public databases and over 29,000 articles describing those sequences in the scientific literature. By interpreting this dataset along other qualitative evidence, we drew the following conclusions:
1) Publicly available datasets enable historians to portray genomics beyond well-known episodes and amply publicised accounts, such as those gravitating around the Human Genome Project.
2) The comprehensiveness of those datasets – including all the results of genomics research and not only those arising from well-funded and high-profile initiatives – represent an opportunity to retrieve actors and institutions largely forgotten today but important in making the history of this field.
3) The ways these forgotten participants practiced and organised genomics shows that the notion of a gap between DNA sequence data and practical research goals is only part of the history of this field: in a substantial number of the evidence we interpreted, the production of DNA sequences was inseparably entangled with the use of the resulting data for medical, agricultural or industrial research.
4) This entanglement between sequence production and use suggests that the affordances and limitations of a given genome depend on the communities that produced the underlying DNA data: the necessities and motivations of the actors and institutions that conform these communities, which differ within and across species.
5) Our method of exploiting the historical potential of large datasets can be exported to other scientific fields or areas of human activity.