Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

An Application for leveraging large-scale historical textbases

Project description

Application for data mining in historical documents

While digitisation has made historical texts accessible, academics and students still encounter difficulties when working with institutions that house vast digital repositories. The ERC-funded HistText project aims to develop an innovative application for large-scale data mining in historical textual corpora. This collaborative effort between historians and computer scientists focuses on employing machine learning techniques to analyse extensive text archives. The application is designed to handle databases containing billions of words across millions of multilingual documents. It features a user-friendly interface, advanced text analysis methods, and robust data visualisation capabilities. With its emphasis on advanced text analysis and user accessibility, HistText seeks to revolutionise large-scale textual analysis and offer a new approach to understanding historical documents.

Objective

HistText is a groundbreaking application developed to address the complex challenges of large-scale data mining in textual corpora, with a particular focus on historical documents. Created in the context of the ERC-funded ENP-China project, which aims to study the evolution of Chinese elites from the 19th century to 1949, HistText is the result of a synergistic collaboration between historians and computer scientists exploring machine learning applications for extensive text archives. Designed to manage databases containing billions of words across millions of multilingual documents, HistText offers a robust and versatile platform that streamlines the process of extracting and visualizing valuable insights. The application features a user-friendly interface, advanced text analysis techniques, and powerful data visualization capabilities. It provides a simplified approach for novice users to conduct complex data queries and analyses, while also offering a comprehensive R-library for more expert users. The main challenge that the proof of concept aims to tackle is to make HistText a fully packageable and transferable tool that can cater to the specialized needs of scholars and institutions holding vast digital repositories. With its focus on advanced text analysis and user accessibility, HistText stands as an invaluable resource not only for academics in the digital humanities but also for students and the general public. In terms of broader applications, HistText has the potential to be integrated into a wide range of institutions (libraries, digital content providers, etc.). The platform is exceptionally well-suited for analyzing a wide range of text genres, including newspapers, periodicals, directories, and diaries, among others. By offering a scalable, user-friendly, and methodologically rigorous tool, HistText aims to revolutionize how we approach large-scale textual analysis, providing a new pathway for understanding historical documents.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-ERC-POC - HORIZON ERC Proof of Concept Grants

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2024-POC

See all projects funded under this call

Host institution

UNIVERSITE D'AIX MARSEILLE
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 150 000,00
Address
BOULEVARD CHARLES LIVON 58 LE PHARO
13284 Marseille
France

See on map

Region
Provence-Alpes-Côte d’Azur Provence-Alpes-Côte d’Azur Bouches-du-Rhône
Activity type
Higher or Secondary Education Establishments
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

No data

Beneficiaries (1)

My booklet 0 0