Skip to main content

Jointly Mining User Generated Content Sources

Final Report Summary - JMUGCS (Jointly Mining User Generated Content Sources)

The advent of Web 2.0 empowered users to actively interact with the Web instead of passively consuming content. Today, Web users contribute content to discussion forums, microblogging sites, and review portals, while they organize themselves into online social networks where they form relationships post their thoughts and activities, and interact with each other. Individuals can now have a “presence” on the Web that goes well beyond creating a home page and some documents. Web users generate knowledge, either explicitly by contributing content, or implicitly through their choices and actions online. This kind of data is a goldmine for scientific research with an unlimited number of practical applications, ranging from marketing and recommendations to sociology and political science. For example, for the first time in history we are able to tap into the collective conscience of the planet’s population, and credibly answer the question “what do people think about X” where X can be a person, an object, an idea, or an event. We can perform large scale sociological studies to understand how users interact and affect each other.

In this project, we jointly mine different types of user-generated data, in order to enhance our understanding for the task at hand, and improve the knowledge extraction process. To this end, we considered the following sources of information about online users: textual information contributed in the form of reviews, micro-reviews, tweets and discussions; social network data in the form of friendship or following relationships between the users; structured data in the form of attribute-value pairs for users or items they interact with; user behavior data in the form of numerical ratings and opinions. Using this data we address the following general problems: summarization of reviews and micro-reviews; recommendations of content and links to users; understanding the evolution and nature of links in social networks; understanding the way opinions are shaped, expressed and diffused online; interpreting textual information using structured data.

The research fellow pursued research in all of these directions within the project. Here, we outline the most important contributions of the project.
• In the area of review summarization the project introduced novel algorithms for selecting reviews, and summarizing micro-reviews by jointly mining reviews and micro-reviews. The approach of synthesizing a review with review sentences that cover the content of micro-reviews is an effective and practical method for generating a coherent summary of short and unstructured text. Furthermore, the project introduced the novel problem of multi-entity summarization, where the goal is to produce a summary of the reviews for a diverse set of entities. This has a practical application in the summarization of venues in a neighborhood, or items from a specific category.
• In the area of recommendations the project introduced the novel problem of package-to-group recommendation, where we want to recommend a set of items to a set of users. This novel problem raises new challenges, such as the issue of fairness. With multiple items in a package we can aim for a package that contains at least one item that is in the top preferences of each user. We view this package as fair, even if it sacrifices the average quality of the package. Finding a fair package is a hard problem. The work in the project proposes efficient algorithms that take fairness into account.
• In the area of social network analysis the project proposed an elegant methodology for characterizing ties in a social network, based on established theories from psychology (Strong Triadic Closure). Furthermore, it studied the effect of link additions in evolving networks, and proposed a new paradigm for link recommendations, where the utility of the network owner is taken into account. In the case examined in the project, the utility is the average shortest path length, which should be small to facilitate information exchange.
• In the area of opinion mining, the project studied the problem of diffusion maximization in evolving networks, and proposed new models and algorithms, since existing algorithms for static networks do not work well on dynamic ones. The project also proposed models for the emergence of polarization in review ratings, and introduced the novel problem of troll vulnerability prediction, where posts that face the risk of being trolled are identified in advance. The work in this area is of great practical importance. Given that online social media and networks are currently the main forums for conducting the public dialogue and shaping the public opinion, issues regarding polarization, misinformation, and disrupting online behavior will become of critical importance in the future.

The research project was developed to jumpstart the career of the principal researcher as a new faculty member, and enable collaborations with other faculty members and the recruitment of students. By the end of the project, the Marie Curie fellow is successfully integrated in the local academic community. He is currently a tenured Associate professor with a research group consisting of several undergraduate and graduate students, and a network of collaborations with fellow faculty members in the host department and abroad.