Final Report Summary - 5COFM (Five Centuries of Marriages)
One particular achievement common to both branches is the creation of a crowdsourcing platform. In the first part of the project, we created a computer assisted data entry tool for transcription making possible the simultaneous participation of more than 140 collaborative workers. In the second part of the project, the tool has been used to correct, codify and harmonize the database by expert researchers, thus facilitating the extremely heavy harmonization and codification processes. Another common achievement of the two branches has been the creation of the record linkage software named "Buscadescendencias" used to reconstruct genealogies. For the specific research in handwriting recognition, a second platform has been also developed for the visual annotation of images to generate the ground truth for the evaluation of the computer vision algorithms.
The BHMD has been thoroughly validated, codified and harmonized as well. Secondary databases (HISCO codified occupations, geo-references, dictionaries of standardized names and surnames) have resulted as by-products.
The results have gone well beyond the initial expectations thanks to the intensive application of innovative computerized tools based on computer vision and co-working paradigms. Thus, the second most important achievement of 5CofM is its novel methodology that integrates not only the computational tools, but also the knowledge accumulated over the five years that have defined the usability of the system. These methods and procedures have been pioneer Europe-wide among the community of digital humanities and historical demography in particular and have generated several spin-off projects with the application to other sources.
One remarkable result of our research is the demonstration that data relative to a single demographic variable, marriage, can be used to study a wide array of other subjects: population estimates, location and growth; survival, migration, fertility; social status, social mobility, inter-generational transmission; religious obedience, secularization, consanguinity; social, occupational and territorial change; the emerging of a metropolitan area. Following the advancement on the database building ad hoc methodologies to approach these subjects have been created or developed, often borrowing them to other disciplines such as biology or onomastics.
On the other side, advances in handwriting recognition have been astonishing, with bright developments in the fields of work spotting and contextual recognition. A prototype of reading and extracting tool has been produced at the end of the project, ready to be used in future research.