CORDIS
EU research results

CORDIS

English EN

CONFIDENTIALITY OF DATA AGAINST DATA MINING METHODS

Objective

Securing data against intruders attacking implicit sensitive information is an open research problem. In order to make a publicly available system secure, we must ensure not only that private sensitive data have been trimmed out, but also that certain inference channels have been blocked as well. Moreover, the need to make a system as open as possible - to the degree that data sensitivity is not jeopardised - asks for techniques that account for the disclosure control of sensitive data. We aim at investigating various data mining methods as a threat to data security. We plan to evaluate the initial work on data mining against data security and then investigate possible techniques to ensure data confidentiality against a wide spectrum of data mining methodologies and novel information types. Securing data against intruders attacking implicit sensitive information is an open research problem. In order to make a publicly available system secure, we must ensure not only that private sensitive data have been trimmed out, but also that certain inference channels have been blocked as well. Moreover, the need to make a system as open as possible - to the degree that data sensitivity is not jeopardised - asks for techniques that account for the disclosure control of sensitive data. We aim at investigating various data mining methods as a threat to data security. We plan to evaluate the initial work on data mining against data security and then investigate possible techniques to ensure data confidentiality against a wide spectrum of data mining methodologies and novel information types.

OBJECTIVES
(a) Investigation of new techniques for secure data mining that will cover the main aspects of data mining (association rules, classification, clustering);
(b) Often data is spatially distributed, so special attention will be given to the investigation of new techniques for secure distributed data mining;
(c) The existing and newly investigated techniques for secure data mining will be implemented and tested thoroughly against real data sets for their effectiveness and against synthetic data sets for their performance;
(d) Specification of an evaluation framework in order to compare all the techniques in a common platform which will be the basis for determining the appropriate technique for a given type of application;
(e) Constructing the know-how for possible threats against data security that can be caused by data mining tools and how they could be overcome.

DESCRIPTION OF WORK
There can be various disclosure methodologies depending on the data mining technique in use. Information disclosure control techniques that we are going to investigate can be summarised as follows:
(a) A data hiding technique which is suitable for association rule hiding and prevention of the prediction of confidential data via decision trees;
(b) Inserting Unknown Values can be used when perturbing the data or inserting wrong values may cause serious problems as in the case of medical data;
(c) Data perturbation techniques can be used to modify the data to preserve the confidentiality yet letting the approximately correct data mining model to be extracted;
(d) Data Swapping techniques shuffle the data values in the same column and can be used in cases where data removal can reveal some hints on confidential data mining results;
(e) Data Alteration changes values randomly.
The distributed nature of most of the data encountered in practice, motivated the research on distributed data mining.
Therefore, by considering a distributed scenario of data we propose methodologies for security of this data against data mining techniques. Data warehousing technology will be investigated as well in this context. The sanitation process should work in a way that maximises the degree of security in terms of sensitive information while trying to keep the data quality as high as possible. An optimisation of data quality and degree of sanitation will be investigated in this project. Various application areas to be considered are the regular disclosure of data, secure outsourcing of sensitive data, secure data trade among companies, combined data analysis prior to company mergers, secure disclosure of protein sequences and DNA data.
The work on securing the data against intruders attacking the implicit sensitive information in the data has just started and is yet to cover the broad spectrum of data mining techniques. In order to make a publicly available system secure, we must ensure not only that private sensitive data have been trimmed out, but also to make sure that certain inference channels have been blocked as well. In other words it is not only the data but the hidden knowledge in this data, which should be made secure.

Moreover, the need for making our system as open as possible
- to the degree that data sensitivity is not jeopardised
- asks for various techniques that account for the disclosure control of sensitive data. We have considered some aspects of data like dimensionality and distribution, as well as some data mining methods as a threat to data security. The plan that was set out in this project was to evaluate the initial work on data mining against data security and then to investigate possible techniques to ensure data confidentiality against a wide spectrum of disclosure methodologies and novel information types. We have reached at a stage where a selected set of privacy preserving data mining algorithms has been developed in the prototype system but further resources are needed for exploiting this new research area fully. We feel that the continuation of this project from an evaluation to a regular phase will provide us with the best possible resources to investigate further this interesting research area, and place ourselves among the pioneers at an international level. In this way we hope that we will be in a position to contribute to the field with the highest potential and formulate a network of excellence in the field of privacy and security of data and information.
Leaflet | Map data © OpenStreetMap contributors, Credit: EC-GISCO, © EuroGeographics for the administrative boundaries

Coordinator

RESEARCH ACADEMIC COMPUTER TECHNOLOGY INSTITUTE

Address

61, Riga Feraiou Street
26221 Patras

Greece

Participants (2)

Sort alphabetically

Expand all

SABANCI UNIVERSITY

Turkey

UNIVERSITA DEGLI STUDI DI MILANO

Italy

Project information

Grant agreement ID: IST-2001-39151

  • Start date

    1 October 2002

  • End date

    30 September 2003

Funded under:

FP5-IST

  • Overall budget:

    € 100 000

  • EU contribution

    € 100 000

Coordinated by:

RESEARCH ACADEMIC COMPUTER TECHNOLOGY INSTITUTE

Greece