Skip to main content

Formalizing Subjective Interestingness in Exploratory Data Mining

Final Report Summary - FORSIED (Formalizing Subjective Interestingness in Exploratory Data Mining)

During the FORSIED project, a novel rigorous framework for the design of algorithms for exploring large amounts of potentially complex data was developed, and successfully applied to range of data science tasks.

More specifically, on a conceptual level, FORSIED has led to an understanding of how the interestingness of patterns found in data can be formalized in a subjective manner, accounting for the data analyst's prior knowledge and beliefs about this data. The usefulness of the resulting FORSIED framework was demonstrated and evaluated by applying it for the development of novel methods for a range of challenging exploratory data science tasks. Some examples include: finding interesting association patterns in relational data (e.g. data stored in relational databases), identifying interesting patterns in networks (e.g. communities, regions with surprising densities, graph embeddings), generating and visualizing insightful low-dimensional representations of high-dimensional data (both using linear and non-linear methods), and more. Some of these problems could not practically be tackled previously, while in others the FORSIED approach led to superior performance.

The developed methods will find applications in data-rich research fields (e.g. social sciences, life sciences, journalism studies), governments (e.g. the job market, the justice system), and industry (e.g. human resources, e-commerce and recommender systems, business intelligence).

As an important side-result, the FORSIED framework was also found to suggest new and robust ways to formalize (and thus to preserve or guarantee) privacy and fairness in data science.