Tools to understand data from out-of-date systems will help rapid integration of data from newly developed systems.
A software infrastructure that enables scientific data from long-completed projects to be accessed and understood while advancing shared global usability of current digital research is under construction.
Since the 1970s, spacecraft have made vast numbers of readings on their travels and sent that data back to Earth. But what happens to the data when the mission ends and the software and tacit knowledge needed to interpret it are no longer available?
Generally, there is little money put aside for long-term data preservation. Magnetic tapes full of valuable information have ended up sitting on shelves. A huge amount of data from the arts and humanities, as well as scientific research, is becoming inaccessible and/or unusable ever-more quickly.
Researchers from across Europe on the CASPAR project set out to find secure, reliable and cost-effective ways to ensure digitally encoded information remains usable for an indefinite time period. The methodologies and tools developed during the project are not only important because they will give us access to data from the past, suggests David Giaretta, CASPAR project coordinator, and a researcher at the STFC’s Rutherford Appleton Laboratory.
“The techniques that you need to preserve old digital objects – techniques that make unfamiliar digital objects usable – are exactly the same techniques you need to make newly created digital objects accessible and understandable,” he says.
If the e-science concept of research facilities sharing computational processing and data collections across the internet is to be fully realised, it will require a CASPAR-style infrastructure.
Indeed, CASPAR infrastructure will put data into a context so that it can be interpreted or understood by ‘designated communities’ – defined by those who are responsible for the data. For example, the infrastructure may inform us that long lists of numbers are actually calls made from a telephone over a certain period. Learning this would provide most of us with no useful information. However, for ‘designated communities’ such as the police investigating a crime or the telephone company’s invoicing department, understanding that the numbers are telephone calls may be very valuable knowledge indeed.
Driving industry standardisation and change
Because the infrastructure developed by this EU-funded project is a pioneering implementation of Open Archival Information System (OAIS, ISO 14721), an ISO standard reference model for digital preservation, its influence will be felt right across the digital preservation industry. The purpose of OAIS is to increase awareness and understanding of concepts relevant for archiving digital objects, especially among nonarchival institutions. It defines terminology and concepts for describing and comparing data models and archival architectures.
In fact, CASPAR’s implementation of OAIS defines the methodology and infrastructure for digital preservation across Europe. It guarantees not only understandability but also the protection of digital rights as well as the authenticity of the information preserved.
CASPAR produced eleven reusable infrastructure components and toolkits to support digital preservation: registry, knowledge management, orchestration, representation information, preservation datastore, data access and security, digital rights management, finding aids, virtualisation, packaging, and authenticity.
All components are independent from each other and they offer web-based (and other) services. That gives the system great robustness because there is no single point of failure. CASPAR is an open system able to interoperate with the many different commercial digital preservation solutions on the market.
Building the e-science infrastructure
“Over the next five years or so we expect to see those CASPAR components … integrated into the broader e-science infrastructure that is being created in Europe,” says Giaretta.
“That is why it was so important that CASPAR tools could cope with all types of data, and were tested using cultural and performing arts as well as science data. There are a number of tools and toolkits within CASPAR that are closely tied to specific domains, but there are also elements that are discipline-independent, as you would expect with infrastructure.
“We expect an evolution in the use of the domain-specific tools while other parts will be made even more robust and scalable as they move over into the broader infrastructure across Europe,” he concludes.
The CASPAR project received funding from the ICT strand of the Sixth Framework Programme for research.
Check out CASPAR's software releases which can also be found on Sourceforge.net's digital preservation services section.
Media note: This feature can be republished without charge provided ICT Results is acknowledged as the source at the top or the bottom of the story. You must request permission before you use any of the photographs on the site. If you do republish, we would be grateful if you could link back to the ICT Results site (http://cordis.europa.eu/ictresults). Let us know if you republish so as to help us provide you with a better service. If you want further contact information on any of the projects cited in this story please contact us.