CORDIS - EU research results

Five Centuries of Marriages

Final Report Summary - 5COFM (Five Centuries of Marriages)

Five Centuries of Marriages (5CofM) has been a 5-year an ERC-AdG research initiative starting May 2011 and based on the data mining of the Llibres d'Esposalles (books of marriage licenses) conserved at the Archives of the Barcelona Cathedral. This extraordinary data source comprises information of the approximately 610.000 unions celebrated in the Diocese between 1451 and 1905. The project has been carried at the Universitat Autònoma de Barcelona, the Centre for Demographic Studies (CED) and Centre for Computer Vision (CVC) and has brought together scientists from historical and social sciences and from computer sciences. It has represented a step forward in the consolidation of common research synergies within the scope of the digital humanities area. The two teams have complemented each other: the construction of a Barcelona Historical Marriage Database (BHMD), certainly the main achievement of 5CofM, has been made possible by computer-assisted tools but, on the other hand, the handwriting recognition methods have been made implemented with the modelization and integration of the contextual knowledge provide by the social science scholars. These interdisciplinary developments have resulted in numerous joint publications and presentations at international conferences.

One particular achievement common to both branches is the creation of a crowdsourcing platform. In the first part of the project, we created a computer assisted data entry tool for transcription making possible the simultaneous participation of more than 140 collaborative workers. In the second part of the project, the tool has been used to correct, codify and harmonize the database by expert researchers, thus facilitating the extremely heavy harmonization and codification processes. Another common achievement of the two branches has been the creation of the record linkage software named "Buscadescendencias" used to reconstruct genealogies. For the specific research in handwriting recognition, a second platform has been also developed for the visual annotation of images to generate the ground truth for the evaluation of the computer vision algorithms.

The BHMD has been thoroughly validated, codified and harmonized as well. Secondary databases (HISCO codified occupations, geo-references, dictionaries of standardized names and surnames) have resulted as by-products.

The results have gone well beyond the initial expectations thanks to the intensive application of innovative computerized tools based on computer vision and co-working paradigms. Thus, the second most important achievement of 5CofM is its novel methodology that integrates not only the computational tools, but also the knowledge accumulated over the five years that have defined the usability of the system. These methods and procedures have been pioneer Europe-wide among the community of digital humanities and historical demography in particular and have generated several spin-off projects with the application to other sources.

One remarkable result of our research is the demonstration that data relative to a single demographic variable, marriage, can be used to study a wide array of other subjects: population estimates, location and growth; survival, migration, fertility; social status, social mobility, inter-generational transmission; religious obedience, secularization, consanguinity; social, occupational and territorial change; the emerging of a metropolitan area. Following the advancement on the database building ad hoc methodologies to approach these subjects have been created or developed, often borrowing them to other disciplines such as biology or onomastics.

On the other side, advances in handwriting recognition have been astonishing, with bright developments in the fields of work spotting and contextual recognition. A prototype of reading and extracting tool has been produced at the end of the project, ready to be used in future research.