Skip to main content

Knowledge Discovery in Data as Collaboration of Human and Software Actors

Periodic Reporting for period 1 - KDD-CHASER (Knowledge Discovery in Data as Collaboration of Human and Software Actors)

Reporting period: 2018-02-01 to 2020-01-31

The KDD-CHASER project was launched in February 2018 to investigate the potential of a new concept where people who have collected data about themselves can analyse it in collaboration with experts. The data collection method may be, for example, a wearable fitness tracker, which are growing in popularity and provide the user with a wealth of information about their physical activity and sleep. In theory, the user could extract additional knowledge from the data using data analysis tools, but in practice, the average user does not have the required expertise. The central idea of KDD-CHASER was to study how such people could be brought together with people who do have the expertise and are willing to help.

Analysing personal data to discover useful knowledge about the individuals concerned is usually viewed as something done by corporations with access to large quantities of customer data, such as Google or Facebook. Thus, when exploitation of personal data is viewed from the perspective of the individual, it is typically seen as something that needs to be regulated in order to ensure that corporations do not abuse the power that comes with the possession of data about people. The perspective that this data could also be something that the people themselves exploit for their own personal benefit tends to get overlooked; the significance of the collaboration concept studied in KDD-CHASER is that it would enable individuals to achieve this by working together with other individuals instead of handing control of their data over to a company.

The overall objectives of KDD-CHASER were to build a model of the process of collaborative data analysis, to develop a software platform to support this type of collaboration, and to demonstrate the viability of the process model and the software platform by running a trial. The successful execution of the trial shows that the collaboration process is feasible and that the software platform can be used to support it in a real-world environment, although the usability, stability and performance of the software still require substantial improvement. Furthermore, the results of a survey conducted among the participants of the trial suggest that there is interest in this type of collaboration and that many people could gain useful information about themselves through collaborative analysis of their personal data.
Initially, an extensive literature review was carried out to catalogue and categorise existing systems and enabling technologies for collaborative data analysis. A tentative process model for collaborative analysis was also developed at this stage. Work then began on a domain ontology for collaborative data analysis. The ontology defines domain concepts and relationships among them and is expressed in a language that allows a computer program to do automated reasoning on these concepts and relationships. A particular problem that the ontology was designed to address was supporting the negotiation phase where the participants of the collaboration must come to an agreement on what is to be accomplished during the collaboration and under what terms. To this end, the ontology enables representation of data analysis tasks and privacy constraints and allows reasoning software to detect conflicts where an analysis task proposed by an expert requires some data that the data owner is unwilling to share.

After the first version of the ontology had been designed and implemented, development of a software platform for collaborative analysis of personal data was started. The platform was designed to support finding and inviting collaborators, creating and sharing datasets, visualising analysis results and communicating with collaborators via text-based chat.Internally the platform was designed to use the ontology to represent and store all information about the users of the platform and their collaborations with one another. To test the software platform and the collaboration process in practice, a trial was carried out. 12 volunteers were recruited, asked to use wearable devices to record their sleep data for a period of approximately 2 months and given instructions for using the platform to share their data with a researcher (playing the role of data analysis expert) and to view the analysis results. At the end of the trial, the volunteers were invited to complete a survey to provide feedback on the software platform and the collaboration process. The results of the trial indicate that the process and software are fundamentally viable but suffer from issues that need to be addressed by further research.

The results of the project were disseminated by publishing peer-reviewed papers in international scientific conferences. Further publications are being prepared and are expected to be submitted for review in the first half of the year 2020. The software platform did not reach a sufficiently high technology readiness level during the project to be exploitable, but research funding has been applied for to continue the development of the ontology and the software following the successful proof of concept.
KDD-CHASER introduces a novel concept where ordinary individuals who have collected data about their own lives can turn it into useful knowledge by collaborating with data analysis experts of their own choice. In this new area of research, the project has taken several significant steps toward the creation of a solution that would enable people to unlock the full potential of their personal data through online collaboration. The process model developed in the project identifies the collaborative tasks and activities that need to be supported and the key requirements for a system intended to support them. The domain ontology provides a formal model of an area at the intersection of related domains such as knowledge discovery and privacy, which is not adequately covered by existing ontologies. The software platform constitutes a proof-of-concept implementation of the identified requirements, using the domain ontology as a crucial component in a novel manner. The results of the trial provide valuable data to support further development of the process model, the ontology and the software platform.

The immediate impact of the results of KDD-CHASER is to lay the groundwork for more applied research in the area of collaborative analysis of personal data. The long-term impact of the work, assuming that the proof of concept is eventually transformed into a fully fledged product, is potentially highly significant. Empowering people to control their data and to refine it into knowledge would enable everyone to enhance their quality of life through the application of technologies that most people currently have limited access to, such as artificial intelligence. Given that activity and sleep data are highly relevant to health, better exploitation of such data for the benefit of the data owners would have a considerable positive social impact as well. Collaborative data analysis also has the potential to become a new area of profitable economic activity, with business opportunities for freelance data analysts and providers of collaboration platforms. Finally, collaboration with individual data owners may become a new way for researchers to obtain access to data, potentially boosting scientific research in any discipline where there is a use for personal data collected by the individuals themselves.
The main window of the collaboration platform, showing information about active collaborations.
The visualisation editor window of the platform, showing a line chart visualisation being created