Skip to main content

Cultural Heritage Language Technologies

Objective

CHLT is an international collaboration which will address important questions, and offer practical solutions, regarding the creation of International Digital Library Technology (IDLT) that includes: (i) the development of an infrastructure for IDLT; (ii) powerful IT tools for end users that are designed to be responsive to the ways that different individuals and researchers use the system; (iii) integration of advanced tools for working with and visualising digital documents; (iv) establishment of a framework for sharing metadata, data, and tools across multiple digital libraries; (v) providing a stable, distributed archive allowing for long-term preservation of, and easy access to, digital data.

Objectives:
The objectives of this collaboration are to create an infrastructure for the creation of pioneering International Digital Library Technology (IDLT), and a range of IT applications for use within digital collections (with special emphasis on early modern Latin, classical Greek, and Old Norse texts), which will provide generic tools for multi-lingual information retrieval; concept identification and visualisation; vocabulary analysis and syntactic parsing.
The programs will be developed in an open-source environment so that data, metadata, and tools may be shared among full partners and affiliated digital libraries in order to test prototypes within the consortium, refine programs, and reflect user-studies at the research and development stage. The infrastructure will allow partner libraries to generate hypertexts that will link similar resources in different collections, unify search and retrieval facilities, and share resource-intensive programs. We will also create and integrate new corpora as testbeds for these applications (These corpora, many of which will be linked to pre-existing digital facsimile images of original manuscripts, include300 MB - more than 60,000 pages - of literary and scientific early modern Latin texts and 12 MB of Old Norse literature).

We will draw upon the expertise of IC Innovations (an Imperial College Technology Innovation Centre) to advise us on the exploitation and dissemination of technologies that are developed in this consortium (both US and EC) with respect to commercial and non-commercial partners in the areas of (i) knowledge management, (ii) multi-lingual search and retrieval, and (iii) digital library infrastructure models; they have also agreed to assess and advise on the yearly reports that will be required froe the Technology Transfer Office (TTO) of each partner institution.

Work description:
Our consortium involves 9 participants (4 US and 5 European) who are committed to taking a leading role in the development of international digital library technology (IDLT). Each of the participants brings to this project a digital collection that when linked together will create a mini-international digital library which will act as a testbed for the creation of competing 'infrastructure models' and IT applications. The workpackages undertaken by each of the partners are designed to contribute to the overall development of an 'IDLT Infrastructure Model' (which will be our first deliverable), upon which all further work will progress. A report on the justification of this chosen 'model' will be produced to show the logic of our decision-making procedures. Once the infrastructure model is in place and operational at each of the partner sites, then individual workpackages can proceed in tandem within this single working environment. The strategy and management policies of this project consistently for toward integration of technology and resources. Our workpackages are organised around a series of advanced digital library applications that will be developed by different workgroups in our consortium and integrated into a single Digital Library System DLS). The methodology of tool creation relies upon the development of a robust indexing architecture that scales across systems in multiple languages. This will allow our programming teams to focus upon developing tools instead of confronting problems relating to disparate tagging systems. This will also allow us to apply these tools to every text in the Digital Library System (DLS) without custom programming for every new set of texts. The aim is to retain the independence of each digital collection while at the same time offering it a place in the overall DLS. Three workpackages involve the creation of corpora as test beds for new applications. Although many of our collaborators bring existing corpora to the project, the infrastructure that we are developing will allow us easily to integrate other texts at a very low cost per megabyte. Ultimately the corpora that we create and integrate into our system will be a substantial contribution to European Cultural Heritage. At the end of three years, we will have added to our system 300MB of Early Modern Latin texts (including many of Isaac Newton's papers) and 12 MB of Old Norse literature. We envisage the addition of a workpackage that will address the problem of 'integration' flagged by the EC evaluators. We also believe that end-user studies and conferences among US/EC partners would play a vital role in the integration, exploitation and dissemination of IDLT. At the end of the project we will be in a good position to advise other institutions and research groups on a substantial number of problems and possible solutions regarding International Digital Library Technology. In light of the fast approaching inevitability that all libraries in the US and Europe will have to address the problem of IDLT, we hope to take a pro-active lead in the development of relevant technologies.

Management procedures: We envisage the addition of a workpackage that will address the problem of 'integration' flagged by the EC evaluators.

Milestones:
At the end of Year 1 we will have produced an IDLT 'infrastructure model' that will be distributed and operational in each of the partner sites, as well as IDLT applications built around a common architecture. The milestones of Year 2 will concentrate on the fruits of sharing data, metadata and tools within the IDLT 'infrastructure model' to create a Digital Library System (DLS). In Year 3 we will be in a position to test out our IDLT and DLS in end-user studies between US and EU partners and refine our programmes to reflect those studies; the final product will be a set of thematically coherent digital library collections in a single Digital Library System based within a Digital Library Infrastructure Model that employs advanced IT applications.

Funding Scheme

CSC - Cost-sharing contracts

Coordinator

IMPERIAL COLLEGE OF SCIENCE, TECHNOLOGY AND MEDICINE
Address
South Kensington Campus
SW7 2AZ London
United Kingdom

Participants (3)

CONSIGLIO NAZIONALE DELLE RICERCHE
Italy
Address
Piazzale Aldo Moro 7
00185 Roma
KOEBENHAVNS UNIVERSITET
Denmark
Address
Noerregade 10
1017 Copenhagen
THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE
United Kingdom
Address
The Old Schools, Trinity Lane
CB2 1TN Cambridge