Information and Communication Technologies (ICT) already provide the technical infrastructure that makes possible the transmission of large amounts of data. A distributed system, whose ultimate mission would be the transformation of statistical data to statistical information, should have accommodated fundamental requirements, inherent to the collection and integration of data from distributed sources.
Among these, data validation is of utmost importance, and it has to take place as early in the information life cycle as possible. The project aims at the design and development of a generic, distributed and flexible data validation system, integrated in various phases of the statistical data process, in order to ensure and monitor data quality. The main concept underlying our approach is the treatment of validation rules as metadata as opposed to the usual approach of implementing case specific validation programs.
The main objective of the project is the design and development of a generic, distributed and flexible data validation system, which will be able to be seamlessly integrated in the current (or future) processes of statistical data collection, in order to ensure and monitor data quality. The system will be accessible in a distributed way in order to validate large statistical data sets before their transmission or even throughout their production, ensuring homogeneity and consistency which are critical quality parameters of the validation process and the data quality in general.
Among the critical project objectives are, the development of a formal framework for the classification and semantic notation of validation rules, the design, development and population of the rules repository, the development of the validation engine and a, Java application implemented, Validation Client.
The project consists of three phases each containing several work packages and tasks:
- Phase I includes the methodological work required for the elaboration of the background on which the development of the Validation System will be based. The analysis of information quality requirements and the derivation of an object model of the system belongs to WP1, the classification, algebraic definition and specification of the required validation rules, and the refinement of the object model towards a detailed metadata repository design, in WP2.
- The implementation of the derived module is achieved within Phase II, following the main building blocks of the system architecture. A specific workpackage is dedicated to each of these system modules. The Validation Rules Repository is in WP3, the Validation Engine in WP4 and the Validation Client in WP5.
- In the final stage, Phase III, the main system modules are integrated, used and tested for trial operation and validated, providing feedback and further input for the final technological implementation and Commercial Exploitation. Pilot Operation and Evaluation is in WP6, Technological Implementation Plan in WP7, Dissemination Activities in WP7, while WP0 provides the administrative and project monitoring infrastructure required.
All major phases of the project produce concrete results and milestones:
- In Phase I the detailed validation rules classification scheme and the metadata model are developed;
- In Phase II the repository of validation rules and metadata, the validation engine and client;
- In Phase III leads to the pilot testing and evaluation of the results, along with the exploitation and dissemination activities.
Funding SchemeCSC - Cost-sharing contracts