Final Report Summary - MODAS (Mob Data Sourcing)
A major contribution of the research is the development of a novel model for data-centric crowd sourcing, which we call crowd mining. To understand the importance of this new model, observe that a key challenge is crowd-based data management is that the human knowledge forms an open world and it is thus difficult to know what kind of information we should be looking for. Classic database research have addressed this problem by data mining techniques that identify interesting data patterns. These techniques, however, are not suitable for the crowd. This is mainly due to properties of the human memory, such as the tendency to remember simple trends and summaries rather than exact details. Following these observations, MoDaS managed to develop for the first time the foundations of crowd mining. We defined the formal settings for crowd mining; based on these, we designed a framework of generic components, used for choosing the best questions to ask the crowd and mining significant patterns from the answers. We suggested generic implementations for these components, and tested the resulting algorithm's performance on benchmarks that we designed for this purpose. Our algorithms consistently outperform alternative baseline algorithms. Encouraged by success of this direction, we then explored a novel approach that broadens crowd data sourcing by enabling users to pose general questions, to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent, significant data patterns . Our approach is based on (1) a simple generic model that captures both ontological knowledge as well as the individual history or habits of crowd members from which frequent patterns are mined; (2) a query language in which users can declaratively specify their information needs and the data patterns of interest; (3) efficient query evaluation algorithms, which enables mining semantically concise answers while minimizing the number of questions posed to the crowd; and (4) an implementation of these ideas that mines the crowd through an interactive user interface. Experimental results with both real-life crowd and synthetic data demonstrate the feasibility and effectiveness of the approach.
We believe that the crowd mining framework developed in MoDaS is precisely the technological breakthrough needed for opening the way for developing a new and otherwise unattainable universe of knowledge in a wide range of applications, from scientific ones to social and economic ones.