CORDIS - Resultados de investigaciones de la UE
CORDIS

Preparing for the construction of the Digital Research Infrastructure for the Arts and Humanities

Final Report Summary - PREPARINGDARIAH (Preparing for the construction of the digital research infrastructure for the arts and humanities)

Executive summary:

The digital research infrastructure for the arts and humanities (DARIAH) will facilitate long-term access to and use of, all European arts and humanities (A&H) digital research data. DARIAH will support research practitioners at all levels, from beginners through to those employing advanced techniques and methodologies. The DARIAH infrastructure will connect people, information, tools and methodologies for investigating, exploring and supporting work across the broad spectrum of the digital humanities.

Researchers will use DARIAH to find and use a wide range of research data from across Europe; exchange knowledge, expertise, methodologies and practices across domains and disciplines; ensure that they work to accepted standards and follow best practice; and, experiment and innovate in collaboration with other scholars.

The DARIAH infrastructure will be designed to strike the right balance between decentralisation and efficiency, empowering individual contributors to work with and within the DARIAH community and shape its features to their needs. Each contribution of each contributor builds DARIAH and all is linked together in DARIAH's architecture of participation. The construction phase of DARIAH, beginning 2012, will be organised around four virtual competency centres (VCCs): VCC1 e-infrastructure; VCC2 research and education; VCC3 content management; and VCC4 advocacy and outreach. Each VCC will have numerous contributors and each DARIAH partner will be able to contribute to multiple VCCs.

Throughout the preparing DARIAH phase DARIAH has been actively exploring its research-oriented objectives through a variety of means, including engagement with stakeholders, technical systems development and explorations of models for governance and financing. Preparing DARIAH established a consortium committed to the infrastructure's objectives and has delivered an overall business plan, legal document stating the rights and obligations of DARIAH partners and has secured sustainable financial support for the construction and initial operational phases of DARIAH. EUR 4,000,000 per year, considered to be the minimum financial support necessary for the construction phase, have been secured for the first three years.

DARIAH will establish itself as a European Research Infrastructure Consortium (ERIC) and has encouraged participating countries to sign a memorandum of understanding (MoU) underlying their desire to participate in the infrastructure, thereby further strengthening Europe's position as a centre of world class research. Germany, France, The Netherlands, Ireland, Greece and Croatia have already signed the MoU, with further signings from Denmark, Austria and Slovenia foreseen in the near future. Switzerland has committed itself to join the DARIAH ERIC as cooperating partner and talks are under way with Lithuania and Norway in the hopes of securing their participation as well.

As DARIAH moves forward into the construction phase, it will continue to strive towards the provision of those components and services of a European research infrastructure in the A&H which will enhance scholarly research methods and which will facilitate the publication and reuse of research data on an international level. Ultimately, DARIAH strives to be much more than the sum of each single national or individual contribution; it represents the next generation of research potential in the A&H.

Project context and objectives:

The general objective of the Preparing DARIAH project is to tackle any and all remaining bottlenecks and be ready for the actual construction of DARIAH at the beginning of 2011. Crucial elements to that end have been identified by each work package (WP) short after the kick- off meeting while preparing their operational plan and once again in December 2009 when the amendment for a 6 months prolongation had been officially accepted by the European Commission (EC). Activities and internal objectives for the second reporting period have been defined and clearly scheduled during the regular management boards and the yearly consortium and general assembly meetings. When needed, ad hoc committees have been set up.

Work on the strategy had one main action line to provide a clear and comprehensive overview of the state of the art, including:

1. an overview and analysis of research and infrastructure activities in the digital humanities across Europe, with emphasis on the partner countries, making recommendations about the strategic direction for DARIAH in light of this analysis
2. products and services provided by data organisations in all EU member states and the available technologies, tools and standards worldwide
3. scoping the primary functions of DARIAH (primitives), scope and prototype collective intelligence resource map
4. the strategic recommendations for the next phases, set policies for the construction of DARIAH and develop policies and standards framework for the operation of DARIAH.

On the financial side, the objective was twofold. First, to develop an initial version of a cost model to see what the operation of an European research infrastructure would cost (minimum and maximum); and secondly, to organise roundtable symposia with relevant funding agencies to inform them on the possible funding scheme of DARIAH and receive their feedback on how they could contribute to the financial balance of DARIAH. The first cost model has been updated, a second funding agencies roundtable organised and the 'White paper for European funding scheme' as well as a 'Plan of action for an European implementation of funding scheme' analysed.

In terms of governance, the objectives were to investigate governance models and identify the best governance and management structure for DARIAH in the construction phase, thereby defining the roles and responsibilities of the various components. The governance model for the construction phase (structured around the DARIAH ERIC) has been designed, as has the operational plan. With input from all Preparatory Phase WPs, the DARIAH business plan has also been set up.

As part of the preparation of legal documents, the legal structure chosen during the first period, the ERIC has been further developed. Drafts of the statutes were sent to the Ministries of the countries interesting in joining the construction phase and their feedback has informed the statutes' final version. The first steps to prepare an application to the European Commission for the establishment of the DARIAH ERIC have been taken. The second goal of the legal WP for this period was to draft a 'User licence agreement and draft product and services contracts' and an 'Accession form for future partners and consortium agreement'. Both tasks have been finalised.

The technical WPs WP7 'Technical reference architecture' and WP8 'Conceptual modelling' finalised the technical planning's in preparation of the DARIAH construction phase. This includes the completion of the technical demonstrators and prototypes, as well as the compilation of the technical roadmap - with the roadmap consisting of reference architecture, functional specifications and an initial timeline with operational plan for the construction phase.

Development of several demonstrators and prototypes informed the planning of the technical infrastructure. This included the Arena and text encoding initiative (TEI) demonstrators, which were already specified in the proposal, started during the first project year and completed in the second. In addition to them, several prototypes were identified as necessary, after the work in WP7 and WP8 unfolded and they were also completed during the second project year.

The work on the demonstrators and infrastructure prototypes were shared between WP7 and WP8. While each team analysed different aspects of technical services and conceptual modelling, their findings eventually had to be combined into integrated systems. The roadmap document provides a reference blueprint of key infrastructure services for the forthcoming DARIAH construction phase. It is the nature of digital infrastructure that it evolves continuously as its technological as well as organisational context changes. The infrastructure was designed to sustain such an evolution and the planning documents reflect this openness that expects more input even as the construction phase is ongoing.

In addition to this, an overall objective was to create a well developed internal communication channel via the internal website, the wiki, the weeklies, the newsletters etc and to raise awareness of DARIAH among the outside world, disseminating the project results to relevant conferences and events and/or via the general website and the Newsletters wide mailing lists. The joint supporting digital humanities (SDH) networking event for research infrastructure (NEERI) 2010 conferences organised with CLARIN in Autumn 2010 in Vienna were also forwarding that goal.

Next to this, a general wish was to expand the geographical coverage of DARIAH. To this end, potential new partners in countries where DARIAH was not yet represented have been actively approached.

Project results:

VCCs and service packets

Internally, DARIAH is organised into four VCCs and a coordination office staffed and coordinated by its EU partners. While centred on a specific area of expertise, the VCCs are at the same time cross-disciplinary, multi-institutional and multi-national.

In practice, it is not intended that individual A&H researchers, repositories, research centres or other stakeholders will need to discover on their own which of the various VCCs (or combination of different VCCs) should be consulted to provide them with a technical solution or expert consultation on a particular problem. Instead, DARIAH will provide its stakeholders with a single point of contact or portal. By this means the wide range of infrastructure and support services, which the VCCs can offer the DARIAH community, will be bundled into a smaller set of service packages targeting familiar and commonly requested activities. Overall, the individual service packages will be constituted as generically as necessary in order to highlight key DARIAH services to the largest possible number of A&H stakeholders. However, it is also intended to keep this structure flexible enough to allow it to address the evolving and growing needs of A&H stakeholders over time and the more particular needs of user groups within a specific research domain. For example, an A&H centre will be able to draw on the 'curation' service package to draw on consultation, training and support services as well as archive-in-a-box reference software as well as storage and other technical services from the DARIAH e-infrastructure. In summary, it is intended that the service packets will function as easily comprehensible, single points of access for A&H stakeholders while hiding the technical and organisational details of the national VCC structure underneath.

VCC1, e-infrastructure: '... to establish a shared technology platform for A&H research'. Primary target group: other VCCs, innovators and adopters of technical infrastructure.

The VCC1 e-Infrastructure establishes the technological basis for DARIAH as a trusted intermediary, in which community-developed data, tools and services can be preserved, shared and integrated with the larger A&H community.

The infrastructure and interoperability services provided by the VCC can be broadly divisioned into core infrastructure services, reference software packages and data federation services. In support of these efforts the VCC will provide the DARIAH developers with a centrally hosted service and user registry and a full-featured developer portal. In addition, the VCC will offer a range of technical consulting services and research demonstrators in cooperation and consultation with the other VCCs as well as to individual A&H centres, institutions and researchers.

VCC2, research and education liaison: '... to expose and share digitally-enabled A&H research methods, training, expertise and tools'. Primary target group: individual A&H researchers and research networks.

The goal of VCC2 is to understand A&H research practices and processes in the context of the services provided by DARIAH and to promote the use and application of digitally-enabled methods and tools, with a particular emphasis on promoting the interdisciplinary exchange of research data. It aims to address a range of people and interests, from established researchers to post-graduates to undergraduate students, as well as different disciplines and domains inside and outside of higher education. The VCC will contribute a knowledge base, which captures and links A&H methods, tools and projects and references current digital humanities curricula. In addition the VCC will offer a variety of programs and activities to engage with the A&H community via training and education programs, publications, workshops and seminars.

VCC3, scholarly content management: '... to expose and share scholarly content'. Primary target group: A&H data centres.

VCC3 will deal with the various stages of the scholarly content life cycle, from creation, curation and dissemination, through to the pooling of scholarly digital resources and results for publication and reuse. The VCC will offer services and resources for the representation and management of scholarly data, as well as for the management of associated legal and organisational issues to a diverse target community including A&H data centres and research networks, libraries, publishers, digital humanities centres and individual researchers.

The VCC will facilitate the identification and dissemination of existing digital assets by defining channels for reuse and exchange across communities and research infrastructures , providing reference data registries for the description of scholarly data, collaborating with VCC1 e-Infrastructure to deploy necessary tools and registries and identifying relevant open standards, reference licenses and best practice guidelines.

VCC4, advocacy, impact and outreach: '... to interface to key influencers in/for A&H'. Primary target group: funders, policy makers, industry and others.

This VCC will focus on high level advocacy with key influencers in disciplines/industry who are in a position to assist DARIAH; assessing the impact of DARIAH and measuring the 'added value' that it brings by facilitating the transfer of knowledge within and across disciplines; outreach to wide groups of stakeholders outside the A&H community including industry, cultural tourism and publishing; ensuring the consistent and coordinated promotion and growth of the DARIAH partner networks.

The DARIAH coordination office (DCO): '...to assume the overall responsibility and to ensure adequate operations across all DARIAH organisational units and partners'.

The DCO supports and integrates all levels of DARIAH, including the representation of all DARIAH partners. In its role as a coordinator, the DCO oversees the interactions with all DARIAH partners and boards and takes on a variety of vertical tasks and horizontal tasks. The initial legal seat of the DCO will be in Paris, France (CNRS), with a leadership arrangement between France, Germany and The Netherlands.

Technical infrastructure

The DARIAH research infrastructure is an open, collaborative environment that enables research in the A&H by linking data, functionalities and people. Its 'architecture of participation' accommodates A&H data centres, research networks and researchers that are widely independent, stem from multiple backgrounds, interact with DARIAH following diverse goals and employ various entry-points into DARIAH. Linking this diversity, DARIAH aims for a very lightweight and decentralised infrastructure that can be fit to each stakeholder's situation. Rather than a single technical solution, DARIAH may be many, according to community activities and willingness to collaborate.

The DARIAH technical architecture is built of three horizontal tiers, as well as vertical interoperability frameworks for both data and services. In each of those aspects and for every component, DARIAH seeks a broad interest base and collaborations. In particular, core infrastructure services may be created in close interaction with affiliated initiatives.

Three-tier architecture model

The DARIAH technical infrastructure is built as a loosely coupled service-oriented architecture with three structural tiers in its architecture model, namely the user-facing framework, infrastructure service environment and core infrastructure. It also describes how services can move up and down these horizontal tiers, to enable architecture of participation that is open to contributions and evolves over time.

'In-a-box' services

These are currently two special DARIAH-created solutions aimed at A&H institutions who wish to create their own new digital archives or wish to build a digital research environment for their institution's research community. Both 'in-a-box' solutions combine software that is installed and administered at the institution and 'connects' to the DARIAH central infrastructure services.

Interoperability frameworks for data and services

Linking diversity is at the core of DARIAH's philosophy. Disciplines in the humanities differ greatly with regard to their resources - their data, tools and methodologies. Moreover, innovation is sometimes associated with introducing variations into their data, tools, or methodologies, thereby reinforcing heterogeneity even within a single discipline. Through linking this diversity DARIAH aims to build bridges and enable researchers from different disciplines or cultural backgrounds to collaborate on the same material and to share their diverse perspectives and methodologies. A prerequisite to benefit from this opportunity, however, is interoperability between the diverse resources in DARIAH without enforcing specific formats. In other words, DARIAH aims to mediate between heterogeneous resources and even though interoperability guidelines are optional, their implementation opens up additional opportunities such as increased visibility, collaboration and the applicability of advanced techniques. Among the interoperability channels in DARIAH are digital objects and the data sources that contain them, as well as services and research environments.

Service catalogue and roadmap

This section provides an inventory and a timeline for the technical infrastructure services to be created during the DARIAH construction phase. It presents the core building blocks of DARIAH and their role in the infrastructure, although not all of them are currently covered or fully specified. With the core infrastructure in place, further services will be added and adapted as new countries join DARIAH and novel technology developments emerge. For example, partner projects like the European holocaust research infrastructure (EHRI) add new requirements and opportunities to constructing DARIAH. More such dedicated research environments that build upon DARIAH are expected.

A detailed look at the technical architecture of all the services is not feasible in the available space. We therefore drill-down into a few selected services only:

The persistent identifier service (PID) supports citability of research objects. It is not a single technical service, but rather links various system components with relevant policies. While there are numerous experiences on establishing PID services, DARIAH faces the specific challenges of a huge amount of scientific data for which PIDs need to be minted and the need to weave together diverse PID schemas that are currently in place at arts and humanities data archives.

To tackle the diverse requirements of DARIAH partners and users, the technical architecture will build on a PID service; a meta-resolver to map different PID schemes; usage licences for data objects; as well as a model for pidgin metadata that comes along with the object and informs both humans as well as machines about the object.

The authority mediation service (AMS) builds a network of reference data services, including library authority lists, online dictionaries and thesauri and geo-referencing and place names databases. Reference data services can e.g. assist the description of digital objects through auto-completion and ensure the semantic description of digital objects. Building on the AMS, researchers can enrich references with research-relevant data.

DARIAH at work

In the recent period, DARIAH partners have been involved in various activities or projects that have already shown the capacity of the DARIAH partners to deliver support or services as for various user scenarios or user communities. Such activities include experimenting with advanced demonstrator technologies, partnering in scholarly digital humanities projects, developing tools and software and participating in major European cultural initiatives with very large research communities.

Tools, infrastructure services and demonstrators

TextGrid is one of the first grid-based community projects in the humanities creating an infrastructure for the collaborative editing, annotation, analysis and publication of specialist text resources. TextGrid represents the humanities in the German national grid initiative D-Grid and provides a digital infrastructure, a collective network and a comprehensive and extensible toolset for text scholars.

Technologically, its combination of grid and repositories, as well as services and tools with graphical interfaces establish an open environment that can be adapted to many use cases. In its core functionality, TextGrid focuses on text as a data type since there is considerable demand in the community for processing text data.

TextGrid is particularly interesting for its openness. It avoids swamping the user with rules and requirements, yet still fosters participation and collaboration. For example interoperability can be achieved in a stepwise process and following an incentive system:

1. any data format can be uploaded, TextGrid ensures bit-preservation
2. metadata facilitates data management and retrieval
3. by uploading XML-based texts, a series of services can be used on the data including streaming tools, an XML-editor and other functionalities
4. for TEI encoded documents, TextGrid offers graphical editing, metadata extraction and other functionalities
5. defining a mapping to the TextGrid recommendation for a TEI core encoding allows interoperability on a semantic level.

One of the core goals of TextGrid lies in enhancing the re-usability of existing scholarly texts and services. For both areas-data and services-TextGrid offers various levels of integration: the lowest level offering a minimum barrier to participate and the highest level offering maximum interoperability on a semantic level. Getting from lowest to highest is a stepwise process and users are motivated and assisted for taking each step. In other words, interoperability is not given by design, but it is encouraged.

TextGrid is an example of a technology that supports core Digital Humanities of XML annotation in standard formats. It allows for collaborating and sharing of TEI resources as well as their dissemination.

EHRI: Involving the community

In October 2010 the EHRI was launched, aiming to support the European Holocaust research community by initiating new levels of collaborative research through the development of innovative methodologies and transnational access to research infrastructures and services. To this end, EHRI proposes to design and implement a virtual research environment offering online access to a wide variety of disparate and dispersed key Holocaust archival materials and to a number of online tools to work with them. Building on integrating activities undertaken over the past decades by the 17 partners in the consortium and a large network of associate partners, EHRI sets out to transform the data available for holocaust research around Europe and elsewhere into a cohesive corpus of resources.

The DARIAH expertise has been instrumental in planning and organising the technical service work for EHRI, in particular the requirements, data integration and virtual research environment work. DARIAH's involvement in all these key development fields will ensure that access to integrated archival material will follow best practices and European standards. In the long term, DARIAH offers to EHRI its expertise in ensuring data quality and access to its registry services. In terms of long-term remote data access, DARIAH will provide assistance with persistent identifier and single-sign-on services to EHRI.

MIXED: Preserving content in the long run

MIXED is a digital preservation project. It uses a strategy of converting data to intermediate XML and specifically for tabular data. One of the obstacles to preserving data is software obsolescence. This stumbling block to preservation is usually tackled by either continuously migrating the data or by emulating the software tools. MIXED achieves preservation by migrating data to standard formats.

The strategy used with MIXED converts all datasets, upon ingest to the archive, into an intermediate, generic format. Upon dissemination of a dataset, it is converted from this generic format into a current vendor format of choice. It is likely that the intermediate format will also change, but at a much slower rate. The optimisation is that conversions are split into many contemporary conversions and a few time-bridging conversions. This is a much more manageable situation and the complexity of bridging time can be dealt with by means of one well-defined format.

MIXED concentrates upon tabular data because the lack of standardisation is most keenly felt here and there are several reasons for the acuteness of this feeling. An XML schema, called M-XML, is used by MIXED in such a way that database and spreadsheets are expressed as valid M-XML documents. This format is a non-proprietary representation of the data. MIXED is an open source framework, which accepts converters as plug-ins. The converters allow conversion from existing vendor application formats to M-XML and vice versa.

On ingest of tabular material into an archive, the data is converted to M-XML and on dissemination, the data from M-XML is converted to any spreadsheet or database format the end user requires. Therefore it is possible to convert from one application format to another.

eSciDoc: A platform for digital scholarly resources

In the humanities, the preparation and enrichment of a digital source is part of the scholarly work. In particular many funded projects generate, as one of their results or simply as a by-product, digital editions of the reference documents that have been used in the course of the research activity. However, most of the research teams do not benefit from an in-house infrastructure for providing access to these digital documents and even less for offering additional services allowing the rest of their research community to visualize them and express fine-grained searches. This is the reason why DARIAH is integrating in its technical service a concept for managing digital documents which is articulated along two main lines:

1. taking-up of a generic repository platform, eSciDoc, which is developed and maintained within the Max-Planck Society
2. offering a dedicated service based on the eSciDoc platform and allowing the management of document encoded in conformance to the TEI guidelines.

eSciDoc is an e-research environment developed specifically for use by scientific and scholarly communities to collaborate globally and interdisciplinary. It is an infrastructure encapsulating a Fedora commons repository and implementing a broad range of services. Its service-oriented architecture fosters the creation of autonomous services, which can be re-used independently from the rest of the infrastructure. eSciDoc provides a generic infrastructure and specialised solutions within the context of research questions. It integrates existing solutions and implements new ones.

The target audience of eSciDoc are research organisations, universities, institutes and companies interested in e-Science-aware knowledge and information management. eSciDoc enables the user to publish, visualise, manage and work with data artefacts or objects. Objects include both publication data and research data across disciplines. eSciDoc addresses aspects of data reliability, data quality, data curation and long-term preservation. It covers the whole lifecycle of objects and supports semantic relations between objects.

The eSciDoc system is designed as a service-oriented architecture (SOA) implementing a scalable, reusable and extensible service infrastructure. Application- and discipline-specific solutions can then be built on top of this infrastructure.

TEI repository demonstrator

The purpose of the DARIAH TEI repository is to demonstrate the practical benefits of using TEI for the representation of digital resources of all kinds, but primarily of original source collections within the arts and humanities. As a community-focussed project, the TEI repository also aims to make it easy for humanities researchers to share TEI-encoded texts with others and to compare their encoding practice with that of others in the TEI community. The functions it provides are aimed primarily at humanities researchers with the following requirements.

The initial implementation of the TEI repository uses the eSciDoc platform. It also uses the full spectrum of eSciDoc services to support service development and deployment. As everything is open source, new services can be easily added and existing ones can be amended.

Arena2 demonstrator

The objective of the Arena2 demonstrator is to migrate Arena into a sustainable environment by adding service logic and exposing its resources as autonomous services in a service oriented architecture (SAO) over selected partner data centres. This demonstrator will exhibit added value in terms of the ability to sustain applications from cultural heritage and arts and humanities research beyond the lifespan of this particular project.

The basic architecture adopted follows the 'publish-find-bind' approach. Services publish themselves to a registry as being in accordance with a web service specification. These services are then found and bound to by a client. Key to creating a SOA implementation of the ARENA2 service is the specification of how services should communicate. This required the creation of the Arena gateway service specification document (AGSS). The Arena gateway service specification itself consists of a web services description language (WSDL) document.

In this case the specification is the Arena gateway service specification, the client is the Arena2 portal, the services are either compliant monument inventory services or 'wrapped' services based on legacy protocols and the registry is an instance of a universal description discovery and integration (UDDI) registry.

The existing Arena2 prototype uses a very simple query building interface allowing the user to set values for the 'Where, what, when' elements of the service specification to build up a query. The service version of the Arena2 portal integrates a number of design features that allow more intuitive and meaningful searching in comparison to the existing online prototype. The interface styling itself has benefited from extensive user testing of a similar design paradigm, taking place in relation to a separate project undertaken at the ADS.

General purpose VRE demonstrator

A VRE is considered a collaborative digital environment that facilitates the integration of information resources and tools for supporting research activities. The centre for e-research-based TextVre using the TextGrid environment project is concerned with the institutional integration of VREs in the specialised domain of digital humanities, specifically the creation of XML-based resources. This VRE demonstrator aims to find new ways of integrating and organising the heterogeneous and often unstructured digital resources used in humanities research, including advanced search and browse services. Standardisation is unlikely to solve all issues raised in linking up humanities data, for several reasons:

1. there is a great deal of legacy data in diverse and often obsolete formats
2. training users in the application of a standard may incur a significant investment of time and money, which is not always available
3. standards are generally developed within particular disciplines or domains, such as inscriptions, whereas research is often inter-disciplinary, making use of varied materials and incorporating data conforming to different standards.

This demonstrator is based on use cases that were identified during the earlier research activities of the JISC Engage project LaQuAT. LaQuAT investigated how to integrate scattered, heterogeneous and autonomous data resources relating to ancient texts, mainly databases but also including XML documents.

The starting point for this demonstrator was D4Science, a production-level infrastructure serving mainly scientific communities, which is not biased towards any particular discipline. gCube, on which the infrastructure is based, is a distributed, extensible system designed to support the full life-cycle of modern research, with particular emphasis on application-level requirements for information and knowledge.

Experimenting technology

During its preparatory phase, the technical partners in DARIAH indentified and assess some of the core technologies that have to be deployed to offer optimal services. Among these, two central technologies, which are central for most e-infrastructures have been experimented:

1. mechanisms for uniquely referencing digital assets so that scientist can make a precise reference to a digital source or component thereof, be confident that this reference will be supported in the long term
2. implementing a model for the management of heterogeneous sources so that they can be pooled together and used in the most seamless way.

Experimenting PID solutions

Permanent storage and access to digital material requires a more durable referencing method than currently employed by the Internet. One mechanism that is widely used to deal with this problem of resource location changing is PIDs. In short, PIDs are given to any resource or object that needs to be permanently identifiable. Once a PID is minted for a resource, it is tied to this resource for an indefinite period and any reference using this PID will always refer to the resource it is tied to.

When a researcher cites an article or dataset in his (hardcopy) thesis, he needs to be assured that the citation itself will always lead to the original resource he has used. The PID experiment has been carried out to investigate in what way PIDs can help DARIAH design and construct its research infrastructure. To this end, a prototype PID system has been implemented. The architecture of the system accounts for the fact that, although this implementation will be based on hypertext transfer protocol (HTTP), it should also be capable of working with other protocols. In other words, the architecture has been made flexible enough to adapt to different protocols and delivery mechanisms.

Integrating heterogeneous archives using the open archives initiative (OAI) object reuse and exchange (ORE)

In this infrastructure experiment, which directly targets the interoperability layer, we explored building a highly flexible repository federation for research data on the basis of loosely-coupled services and formats. We examined a prototype for the federation of grid-based TextGrid repository with an iRODS/Fedora repository, which caters for data analysis (i.e. XQuery capabilities on XML/TEI objects) across repositories and other conceivable applications.

Cloud application programming interface (API) to the grid

This experiment analysed patterns for mixing and merging infrastructures. In particular, it looked at 'repositories' as they overarch scientific infrastructure and interactive applications. In an analysis covering a series of experiments, we find an optimal setup in the combination of grid and web technologies through a REST-based interface, which opens up a variety of novel architectural patterns. This combines two contexts and usage patterns: infrastructure for large-scale scientific applications on the one hand and open environments for interactivity and user-generated content and services on the other.

We connected grid and web environments and developed an abstraction upon the storage resource manager (SRM). SRM is a versatile, pivotal grid standard and it is highly interlinked with its environment on an operating system level. As an interface between a grid node and a web server, we hence looked for a lightweight interface that is capable of translating between the two worlds. Existing cloud services are a premier model for translating between infrastructure and the web. Despite their simplicity, REST-based protocols satisfy all the needs of the web community.

Our experimental implementation therefore re-engineered the REST API of the Amazon S3 storage service using Python web server gateway interface (WSGI). The advantages of such a loosely-coupled, HTTP/REST-based architecture are manifold: The interface between the repository and the cloud-like service is obviously very light-weight. Due to the loosely-coupled architectural paradigm, the interdependencies between infrastructure and application (in this case: the repository) are minimised and the two can evolve separately.

A generic repository storage API and a decoupled architecture pattern like this enables other services to tie into the system environment. Multiple repositories can build on a single storage and even specialised services e.g. for format conversion or other administrative tasks are conceivable to work directly at the level of the S3 API. Administrative workflows triggered by the repository, yet executed on the storage level may boost overall scalability of the system environment considerably. Moreover, this loosely-coupled approach may trigger the creation of low- level repository services and hence a variety of agents interacting in an open repository ecosystem.

Digital humanities use cases

Philology and cultural studies

Ulrike Czeitschner, Karlheinz Mörth and Claudia Resch are investigating texts that belong to the so-called 'Dance of Death' and memento mori genres. The Dance of Death project is based at the Austrian Academy of Sciences' Institute of Corpus Linguistics and Text Technology (ILCTT). These texts date back to the Baroque era, in particular the years from 1650 until 1750 and were written to admonish readers to live a life of virtue in order to be prepared for death at any time. By employing a variety of methodologies and contexts from corpus linguistics, art history and literary studies, the text is able to provide insights into the way of life and the perspectives on death from the period and serve as a rich source for the study of popular imagery.

By offering a coherent, shared research environment for images and texts, DARIAH allows researchers from various disciplines to create a collaborative edition of this corpus. In addition DARIAH can offer means to ensure standards based, long-term digital preservation of the data, while at the same time embedding it in a publication environment that permits ease of access, on-going collaboration and reuse.

Musicology, Joachim Veit (Detmold), Gerhard Allroggen (Hamburg), Raffaele Viglianti (London) and Frank Ziegler (Berlin)

A team of music researchers from across Germany are building a comprehensive corpus of Carl Maria von Weber's works, including music, letters, diary entries and other resources in a comprehensive corpus. This will allow research into the artistic networks and influences into von Weber's work, as well as his impact beyond his time and historical context.

The team has already invested much effort into creating music-oriented manuscript description standards, called music encoding initiative (MEI) and is actively using and contributing to the virtual research environment TextGrid to create a truly collaborative edition of the corpus. In addition, the team is developing dedicated tools and methodologies to address their new research questions.

DARIAH will support these efforts and provide the researchers with a broad audience for the standardisation and dissemination of their work. In addition, since TextGrid is already part of the DARIAH network, the researchers can be assured that similar initiatives will benefit from their work by reusing their tools and data and by linking the von Weber corpus to other related corpora.

Philology and onomastics

Prof. Peter Gammeltoft, Institute of Name Research, University of Copenhagen, Denmark, is interested in researches into Danish place names, general onomastic theory (the study of names) and the historical record of place names in the Scandinavian Viking Age colonies in particular. One project of recent focus is DigDag, a digital atlas of the Danish administrative boundary units that addresses their history, archaeology, place-names, statistics and geography in a uniform research infrastructure. DigDag is both an online database for individual researchers and a geographic information system (GIS) platform linked to API, allowing other tools and services to access and query the database remotely.

Even well-established digital database projects such as DigDag can benefit greatly from working closely with DARIAH. By ensuring that its technical web interfaces (APIs) conform to DARIAH recommended standards, the DigDag project can augment its geographic maps with additional layers of information (such as language distribution data) from cooperating DARIAH collections. This in turn can allow researchers to carry out more complex and sophisticated geospatial analyses in the DigDag research environment than were previously possible.

Implementation and governance

The selection procedure

DARIAH potential partners will be asked to submit a proposal requesting membership in the DARIAH construction phase as either a VCC head or a VCC contributor. Potential partners may request membership as a contributor in more than one VCC, but may not propose to head more than one VCC. The proposals must be submitted to the DCO of DARIAH and will be reviewed by external experts and members of the DARIAH scientific board, who together will form a selection committee to make recommendations based on an assessment as to what degree the proposal meets the criteria outlined below. The proposals and the recommendations of the selection committee will be presented to the DARIAH general assembly who will make the final decision about each individual proposal, documenting it and providing feedback to the proposers.

General and additional criteria

A set of criteria is outlined here to enable a judgement to be made on the suitability of a partner to contribute to or head a VCC. A common set of criteria will apply to all those wishing to contribute to a VCC, with an additional set of criteria for those who wish to head a VCC. We propose that each application for membership be assessed against its ability to meet the criteria with a range from one to five (one being the lowest score and five the highest).

Legal structure ERIC

The strategic vision of the European Union (EU) is to make Europe the best place for researchers and in this way create a competitive advantage. In the process of realising of this goal, the EC carried out a study on the issue of the suitability of the existing national, European and international legal forms for the creation and operation of research infrastructures on a pan-European level. Based on this study, the EC proposed to the council the creation of a new legal form, that of the ERIC, in order to facilitate the goals of the European Research Area (ERA). The council accepted this proposal.

An ERIC is a legal person under European law, which is recognised in all EU member states without the need to adhere to formalities in each and every one. It has been modelled after the paradigm of an international organisation, utilising only its most advantageous aspects. This means that states and other intergovernmental organisations can participate directly in the bodies of the ERIC and that the ERIC can take advantage of tax exemptions as well as a separate and simpler procurement procedure.

Its establishment is not subject to an international treaty or to ratification procedures in the national parliaments. It is only necessary to make an application to the EC to initiate an ERIC. The ERIC will be established immediately following the publication of the affirmative decision of the EC in the EU official journal.

What are the advantages of an ERIC?

The main advantage of an ERIC is its separate legal personality in the entire territory of EU without the need for separate formalities in different countries. This is especially crucial for distributed infrastructures, such as DARIAH. The ERIC legal form provides the opportunity to operate in different locations under a simple, single form and organisation.

Another advantage of the ERIC legal form is the enjoyment of tax exemptions, namely value added tax (VAT) and excise tax exemption, along with exemption from any other tax included in the agreement between its members. This means that the cost for the procurement of the necessary material and equipment for the construction and operation of the ERIC will be significantly lower than in any other legal form. In the same framework, each ERIC is entitled to create its own procurement policy, which need not abide by the provisions of the Public Procurement Directive. This means that procurement of goods and services can be very easy and fast.

DARIAH ERIC

DARIAH proposes to create the DARIAH ERIC at the end of the Preparatory Phase. The DARIAH ERIC will undertake the construction and operation of a distributed infrastructure. The proposed membership scheme and organisational structure is relatively lightweight while still encompassing the major, obligatory bodies and the extra bodies necessary for the optimal operation of DARIAH in close contact with the community and in strict adherence to scientific standards.

DARIAH ERIC membership scheme

All interested parties for participation in DARIAH will have the option of applying for membership as one of the following types:

1. members
2. observers
3. cooperating partners.

The first two options are reserved for states, which may delegate their membership and representation to research organisations and universities, while the last option is open to institutions from non-participating states.

Members

Full membership will allow for the participation in DARIAH to the fullest extent possible. Each member will have the right to use all tools and services offered by DARIAH as described above in the VCC section of this document. In the case of chargeable services, members will be entitled to reduced fees. Moreover, members will have the ability to influence the development of the infrastructure since the majority of the personnel will come from the members themselves. Most importantly, members are entitled to participate and vote in the general assembly, which is the supreme body of the ERIC and takes all the important decisions regarding the operation and the future of the ERIC. Furthermore, members will have the option to head one VCC. Naturally, full participation means full financial obligations toward the budget for the construction and operation of DARIAH.

Observers

An observer will be fully integrated into the construction and operation of the DARIAH ERIC. An observer will have the right to use all tools and services. An observer will also have the option to participate in the development of these tools. However, such influence will be limited, since an observer will not be able to head a VCC but only participate in its operations. Furthermore, an observer will only be allowed to sit as a guest in the general assembly.

Cooperating partners

Only institutions from countries not participating in DARIAH as member or observer can be accepted as a co-operating partner. The level of participation of co-operating partners will be relatively low. The co-operating partners will work together with one or more VCCs with regard to specific tasks agreed to by the relevant VCC(s) and approved by the general assembly. A co-operating partner will not be required to contribute to the budget, but will have to bear its own costs for the co-operation with DARIAH.

DARIAH bodies

DARIAH, as a legal person, is required to have its own bodies that will express the will of the organisation and represent it in its external relations and before any court or other authorities.

The general assembly

The general assembly provides the forum where members are able to participate in the decision-making procedure. The general assembly is the competent organ to decide on, among other issues, the budget and the composition of the other bodies. All other bodies and administrative units are obliged to report to the general assembly, the supreme body of the ERIC.

The main powers of the general assembly will be to approve the budget; approve the financial reports and the annual report of the activities; elect or dismiss the members of the board of directors; accept new members, observers, cooperating partners; elect or dismiss members of the scientific board; approve the creation, amendment or dissolution of VCCs; amend the statute; decide its internal rules and procedures; expel members; dissolve the DARIAH ERIC; and, act on any other issue that no other body is explicitly authorised to decide upon.

Each member will be entitled to one vote in the general assembly. The nature of DARIAH and the proposed funding scheme, as described further below, is in line with all major research organisations in Europe. This justifies the one vote per member scheme.

The board of directors

The board of directors will be the executive body of the organisation, entrusted with the task of managing the day-to-day operations of the DARIAH ERIC and to legally represent the DARIAH ERIC in its external relations and before national authorities. It will comprise three members with equal powers, with a three-year term each. In this way, a mixture of experienced directors with fresh ideas will be ensured. The main task of the Board will be to implement the decisions of the general assembly.

It is proposed that the elected directors will be active researchers, employed as members of the DARIAH board of directors at 50 % full time equivalent. The half position will secure the employment of active researchers, since they will not be requested to abandon their research activities. The employment of active researchers as members of the board of directors will ensure a continuing, active and close connection of DARIAH management with the academic community it is going to serve.

The scientific board

The scientific board will be entrusted with the scientific overview of the DARIAH ERIC. The board will consist of qualified individuals such as scholars, software developers, IT experts and experts in other disciplines. They will have the task of evaluating the work and the operations of DARIAH and each VCC. The scientific board will submit an annual report to the general assembly in which its findings on the scientific evaluation of the infrastructure will be included. The scientific board will propose to the board of directors and the general assembly any action it deems necessary for the development of DARIAH, including the creation of new VCCs, the amendment of the scope and tasks of each VCC and the dissolution of a VCC if necessary. Naturally, it will have a significant role in the selection of the VCC head institutions.

DARIAH policies

Access and data policy

All DARIAH tools and services will, in principle, be offered for free. This does not mean that some services, such as helpdesks, custom software development or summer courses cannot be provided for a fee. However, this fee will be reduced for participating countries and institutions, although not necessarily at the same rate for all levels of participation. Any software, data or publication created in the framework of DARIAH, either by DARIAH personnel or its users should be made freely available under an open access/open source licence or its equivalent.

Employment policy

DARIAH should be an equal opportunity employer according to the applicable EU and national legislation. Any personnel employed by partner institutions and seconded to DARIAH will be subject to the employment policies of the employing organisation.

Procurement policy

The DARIAH ERIC should adopt a simplified, transparent and competitive procurement policy, which adheres to the principle of 'best value for money'. It goes without saying that if the procurement is going to be done by a partner institution, a member on behalf of DARIAH, or with the purpose of offering the procured goods and/or services to DARIAH as an in-kind contribution, such procurement shall be subject to the public procurement legislation of the member state.

Funding model

In general, humanities and social sciences are very inexpensive in comparison with science disciplines. To take the example of the eight ESFRI projects dealing with physical sciences and engineering, the estimated costs per year amount to EUR 6 billion. Energy project budgets approach EUR 2 billion annually.

During the construction phase, DARIAH will cost about six million Euro per year, including an allocation of funds specifically for engaging in research projects. One-third of the cost will go to community engagement projects to promote innovative research practices.

DARIAH for its operation, but also for its funding shall rely heavily on the national roadmaps. DARIAH's aim is to utilise and integrate already existing programmes and projects of the national digital humanities roadmap, if a member already has one and encourage the creation of a national roadmap, where there is not one. This means that DARIAH will not render existing investment on projects and infrastructure redundant.

DARIAH will create an infrastructure based on the co-ordinated use, integration and development of the already existing roadmaps, creating thus an infrastructure for the digital humanities, which in fact is more than just the sum of the existing projects.

Public funding model

DARIAH prefers a mixed funding model, which is a combination of contributions both in cash and in-kind. This model strikes the correct balance between the necessary (cash) funds for the independent operation of DARIAH and the need to not overburden the member states and respond to their sensitivity regarding the expenditure of taxpayers' money abroad (in-kind).

In this model, a percentage of the contribution to the budget will be required in cash. This amount has been calculated at a minimum of 10 %. This amount is needed in order to enable DARIAH to operate the DCO. The DCO will function independently of any influence of the member states so that its employees and officers will not have any conflicts of interest.

The in-kind contribution represents an amount of money invested nationally, albeit in a co-ordinated way, based on the needs of DARIAH and the decisions of the general assembly, where the national funding agencies participate. In this way the money invested nationally help the member to fulfil its obligations against DARIAH and at the same time promote the advancement of the national infrastructure and serve the national roadmap for the arts and humanities. These national investments represent the funding for the continued operation of national projects and partners and any additional costs for the necessary actions for their scalability into nodes and components of the DARIAH European Infrastructure.

Observers will have a limited contribution to the budget. Their contribution will also be calculated based on the gross domestic product (GDP) of the observer and will be half of what the observer would have to pay if they were a full member.

Budgetary procedure

The competent organ to decide the budget will be the general assembly, which will convene at least once per year for budget-related discussions. The general assembly will have the duty to examine the budgetary proposal of the board of directors, which shall describe in detail all envisaged expenditures. The head of each VCC, together with the DARIAH financial officer, will aid the board of directors in the preparation of the budgetary proposal. The heads of each VCC will be in the best position to document the needs of their respective VCC.

In the budgetary proposal, the board of directors will provide a calculation of the fees of each member and observer (if applicable). However, the members of the general assembly will have the right to adjust the fees in order to accommodate political criteria and decisions.

Potential impact:

The work undertaken during the preparatory phase has provided a comprehensive understanding of the landscape of the digital humanities/e-humanities. DARIAH has built and expanded upon its relationships with key research centres and digital humanities projects which share, in whole or in part, the DARIAH goal of bringing information and communication technologies (ICT) methods and tools to its community of researchers. Foremost, DARIAH has striven to create a sense of European consciousness around research infrastructures for the arts and humanities.

For example, DARIAH and several other infrastructure initiatives including CLARIN, CentreNet, Project Bamboo, the Alliance of Digital Humanities Organisations and TextGrid, among others, came together to form CHAIN, the Coalition of Humanities and Arts Infrastructures and Networks. In addition to creating ample opportunities for dialogue among its partner projects, DARIAH's engagement with CHAIN will help to ensure that present, proposed and future activities are interdependent, complementary and oriented towards working together to overcome barriers and to create a shared environment where technology services can interoperate and be sustained, thus enabling new forms of research. CHAIN is just one example of many in which DARIAH has shown that it believes strongly in collaborative engagement with its cohort projects and initiatives. With the European perspective at the centre of these initiatives, DARIAH helps to cultivate and catalyse the socio-economic advantages for the continent.

DARIAH has cultivated relationships with a wide range of projects and initiatives having similar or overlapping goals and targets. Some of these relationships have led to the establishment of more formal cooperation in the form of key proposals. For example, DARIAH will work closely with the EHRI to share its knowledge of research infrastructure development, architecture and usage. Some EHRI partners are also integrally involved in DARIAH and there are expected to be significant overlaps between the two projects in terms of technological and strategic approaches to research infrastructure development. The aim is to create a cohesive body of integrated research materials that will be made available online to the public.

Similarly, DARIAH will also work closely with the Collaborative European Digital and Archival Resource Infrastructure (Cendari) project in its efforts to create a powerful platform for delivering and manipulating historical data in a transnational fashion, overcoming the national and institutional data silos that now exist and raising the level of access for scholars to archives that may already have advanced digitisation programmes as well as to those that do not.

The work undertaken in DARIAH's technical WPs (WP7 and WP8) has provided a comprehensive and solid understanding of the technological approaches, which will guide DARIAH during the construction phase and beyond. One of the foremost aspects of this work comes from WP8's work on process modelling of scholarly activity, based on a combination of scholarly information behaviour literature survey with qualitative research on how arts and humanities researchers interact with information. This work contributes to the development of a process model of scholarly information work and also feeds into the specification of an object model for information resources and scholarly objects pertinent to humanities research, an important factor in determining digital infrastructure requirements for scholarship. DARIAH recognises the vital role of the scholarly community in informing DARIAHs development work. This is the community that DARIAH will serve when operational and therefore a comprehensive understanding of the community's scholarly processes is vital to ensuring that DARIAH delivers value and usefulness to its users. This will in turn have a positive impact on the European arts and humanities research area in general.

Through its advocacy and dissemination work, DARIAH has inspired the inclusion of humanities research infrastructure development on national roadmaps in several European countries, among them Austria, Germany, the Netherlands and France. Furthermore, DARIAH has encouraged the funding and development of national infrastructure initiatives that will feed into and support DARIAH during the construction and operational phases.

All of these actions demonstrate the positive impact of DARIAH on the landscape of digitally-oriented art and humanities research in Europe and shows the way towards continued collaboration and development work with stakeholder from across the broad spectrum of arts and humanities research. DARIAH will continue to leverage its broad mandate in the support of further progress in this area.

Project website: http://www.dariah.eu

info@dariah.eu via e-mail