Periodic Reporting for period 1 - PortADa (Port Arrivals Data. Automatic data collection for a large-scale comparative history of 19th century shipping: a Digital Humanities approach to maritime heritage)
Período documentado: 2023-11-01 hasta 2025-10-31
The project studies the ports of Barcelona, Marseilles, Havana, and Buenos Aires, which illustrate diverse maritime profiles: Barcelona, a Spanish metropolis oriented to America; Marseilles, a French metropolis facing east; Havana, a Spanish American colony; and Buenos Aires, a former Spanish colony. Each port played a distinct role in the international division of labour, and comparative analysis allows us to explore differences in connectivity, traffic patterns, and economic integration across regions.
The project has two main objectives: first, to develop a team of digital humanists specialised in computational methods for maritime and economic history; second, to create an open-access database of nineteenth-century trade records for other researchers. The database will cover roughly 1.5 million ship arrivals over six decades, offering unprecedented detail on port traffic, cargo, and the actors involved.
Thematically, it addresses five areas: port traffic, technological change, commerce and business actors, career trajectories, and cultural dimensions. While aggregated statistics on global trade exist, they provide limited insight into regions outside industrial powers, particularly Spanish America. Previous research often focuses on individual flows, leaving broader connections unexplored. Disaggregated port data are essential to understand commercial networks, revealing nodes and connections structuring the global economy. The selected ports were at the peak of their hierarchies during globalisation, and analysing their frequency, intensity, and evolution over time provides insight into trade flows and port development.
This methodology ensures scalability, allowing future inclusion of additional ports and periods, contributing to a richer understanding of nineteenth-century maritime networks. By integrating historical analysis and digital tools, the project offers a transformative perspective on global commerce, producing a dataset and framework to study port cities, trade, and economic integration in unprecedented detail.
A major achievement has been securing digital images of the newspapers and developing software to automatically identify, treat, and process arrival-announcement images. Researchers can now bulk upload images, and the software corrects quality issues, prepares them for OCR, and processes the output to create a digital archive containing all announcement data. Automated extraction is underway for roughly sixty-five years of newspapers per port. The next step is organising the data in the project’s databases and disambiguating terms—avoiding confusion between individuals, ships, or objects with the same name—by assigning unique identifiers to track ships, merchants, captains, cargo types, and other elements across six decades. Large auxiliary lists of controlled terms, such as geolocated historic ports are also being created. Once the database structure for curation is complete, detailed historical analysis can begin. The resulting datasets will be valuable for researchers studying the economic development of each port and its hinterland, offering the most complete accounting to date of maritime traffic, cargo, and importation patterns.
Some PortADa processes are useful for researchers and repositories of historical newspapers. We believe the work can be replicated for other port-arrival records without major material costs, when high-quality images are available. This should enable research on other ports where usable newspaper images exist, contributing to broader datasets on ship arrivals and importations.
The project’s methods also support automated bulk processing of other newspaper sections—such as commodity prices, advertisements, theatre listings, or meteorological reports. Beyond newspapers, the Digitisation Protocol and Data Management Plan offer best practices applicable to other historic printed documents. With some adjustments, the open-source tools can be adapted for other historical investigations.