Living Web Archives

Project description

Digital libraries and technology-enhanced learning
LiWA developed web archiving tools able to capture content from a wide variety of sources, to improve archive fidelity and authenticity, and to ensure long term interpretability of web content.

The interest in Web content preservation is strongly growing, not only in traditional library and archival organisations, but also in sectors such as industry and services. But the typical characteristics of Web content - variety of formats, high dynamics, volatility, interactivity and context-dependency - make adequate Web archiving a particular challenge. With the LiWA project, Web archiving has been established as a new topic for scientific research and development within the digital preservation domain.

At the centre of the project was the concept of 'Living Web Archives', as opposed to the current practice of producing periodic snapshots of pages. 'Living' here refers to:

long term interpretability as the archive evolves and adapts over time,
improved archive fidelity and authenticity by filtering out irrelevant information,
captured content from a wide variety of sources.

To enhance archive fidelity and authenticity, LiWA has developed and tested new methods based on content interpretation and intelligent pattern detection of traps and Web spam. This allows reducing the amount of fake content and helping prioritise crawls by automatically detecting content of value.

To improve the integrity and temporal, structural and semantic coherence of Web archives, some work was dedicated to temporal Web archive construction. This serves the objective to significantly improve content positioning in time and (topic) space and will lay the foundations for fast and effective access to evolving Web content.

To facilitate archive interpretability, LiWA applied methods for semantic and terminology extraction, able to detect and handle evolving semantics, interpretations of domain concepts and terminology. This is a contribution to the task of preserving the usefulness, quality, and accessibility of Web archives over time.

For validating the LiWA approach, two demonstrator applications have been built on top of the LiWA services. The applications focus on the social Web and on the special challenge of archiving audio-visual content.

The potential benefit of this research is twofold: Archiving institutions will be able to automatically archive higher volumes of dynamic and volatile digital content, resulting in a significant increase of preserved digital content. Archive users will benefit from the higher quality of archive content and improved search services.

Web content plays an increasingly important role in the knowledge-based society, and the preservation and long-term accessibility of Web history has high value (e.g. for scholarly studies, market analyses, intellectual property disputes, etc.). There is strongly growing interest in its preservation by library and archival organizations as well as emerging industrial services. Web content characteristics (high dynamics, volatility, contributor and format variety) make adequate Web archiving a challenge.LiWA will look beyond the pure "freezing" of Web content snapshots for a long time, transforming pure snapshot storage into a "Living" Web Archive. "Living" refers to a) long term interpretability as archives evolve, b) improved archive fidelity by filtering out irrelevant noise and c) considering a wide variety of content.LiWA will extend the current state of the art and develop the next generation of Web content capture, preservation, analysis, and enrichment services to improve fidelity, coherence, and interpretability of web archives. By developing methods which improve archive fidelity, the project will contribute to adequate preservation of complete and high-quality content. By developing methods for improved archive coherence and interpretability, the project contributes to ensuring its long-term usability.LiWA RTD will focus on innovative methods for content capturing, filtering out spam and other noise, improving temporal archive coherence, and dealing with semantic and terminology evolution. Two exemplary LiWA applications - focusing on audiovisual streams and social web content, respectively – will show the benefits of advanced Web archiving to interested stakeholders.To ensure demand-driven RTD development and broad, sustained project impact, the LiWA consortium will closely work with the International Internet Preservation Consortium (IIPC) as well as important library and archiving organizations, two of which are members of LiWA.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

FP7-ICT - Specific Programme "Cooperation": Information and communication technologies

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

ICT-2007.4.1 - Digital libraries and technology-enhanced learning

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

FP7-ICT-2007-1
See other projects for this call

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

CP - Collaborative project (generic)

Coordinator

GOTTFRIED WILHELM LEIBNIZ UNIVERSITAET HANNOVER

EU contribution

€ 655 623,00

Address

WELFENGARTEN 1
30167 Hannover
Germany

Region

Niedersachsen Hannover Region Hannover

Activity type

Higher or Secondary Education Establishments

Links

Contact the organisation Website

Participation in EU R&I programmes

HORIZON collaboration network

Total cost

No data

Participants (7)

NARODNI KNIHOVNA CESKE REPUBLIKY

Czechia

EU contribution

€ 53 738,00

MORAVSKA ZEMSKA KNIHOVNA V BRNE

Czechia

EU contribution

€ 52 120,00

MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER WISSENSCHAFTEN EV

Germany

EU contribution

€ 415 350,00

STICHTING INTERNET MEMORY FOUNDATION

Netherlands

EU contribution

€ 629 000,00

HUN-REN SZAMITASTECHNIKAI ES AUTOMATIZALASI KUTATOINTEZET

Hungary

EU contribution

€ 298 400,00

STICHTING NEDERLANDS INSTITUUT VOORBEELD EN GELUID

Netherlands

EU contribution

€ 152 860,00

HANZO ARCHIVES LIMITED

United Kingdom

EU contribution

€ 425 280,00

Project description

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (7)

Share this page Share this page on social networks

Download Download the content of the page

Living Web Archives

Project description

Fields of science (EuroSciVoc) CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s) Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s) Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

Coordinator

Participants (7)

Share this page Share this page on social networks

Download Download the content of the page

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.