Skip to main content

Web Graph: Learning Models for Prediction and Evolution Monitoring

Final Report Summary - LEMPEM (Web graph: learning models for prediction and evolution monitoring)

1. Project scientific objectives

The web-based information collections (web content, social networks etc.) flourish in the last decades with an ever increasing rate of data production. Given the dynamism of such collections we studied various aspects such as:
i. ranking evolution of web pages (i.e. documents);
ii the community structure of the World Wide Web and other dynamic graphs (i.e. social networks, citations etc);
iii. the issue of high dimensionality in large scale data.
Thus the main objectives of the project as evolved throughout the project were:
i. models and methods for ranking prediction;
ii. methods and measures of collaboration / cohesion evaluation in large scale and evolving graphs (i.e. social networks, citations etc);
iii. dimensionality reduction in large scale data in the context of data mining methods.

2. Overview of the results

The work carried out in this project is classified in the areas appearing as objectives above - for each of them we mention the main results and publications that emerged out of them.

Web page rank prediction

In this context, we advanced with the framework for predicting the ranking position of a web page based on previous rankings. The new approaches introduced in this period of work assume a set of successive past top-k rankings, and learn predictors based on locally weighted polynomial regression and EM clustering. We also introduced a new similarity measure (R-Sim) for comparing top-k ranked lists. We performed extensive experiments on real world large scale datasets for global and query-based top-k rankings. The predictions are highly accurate and robust for most experimental setups and similarity measures.

Community cohesion / collaboration evaluation

Community sub-graphs are characterised by dense connections or interactions among its nodes and their detection and evaluation is an important task in graph mining. We proposed a novel approach for evaluation of collaboration / cohesion in communities and graphs. It is based on concepts and structures of graph degeneracy such as the k-core. We extent these algorithms to cover the case of directed graphs as well. We also evaluate communities based on the k-core concept, as means of evaluating their collaborative nature - a property not captured by the single node metrics or by the established community evaluation metrics. Based on the k-core, which essentially measures the robustness of a community under degeneracy, we extend it to weighted graphs, devising a novel concept of k-cores on weighted graphs. We applied the k-core approach on large real world graphs such as the DBLP and report interesting results.

Automated web advertising campaign creation and monitoring

This is an additional activity developed across the end of the project - related to our experience with text mining. Creating a competitive and cost-effective pay-per-click advertisement campaign through the web-search channel can be a taunting task which involves a lot of expertise and effort, usually that of a specialist in the field of web advertising. Assisting or even automating the work of an advertising specialist has emerged as a requirement for companies and research institutes over the last few years, mainly because of the commercial value of this endeavor. We propose an architecture and methodology for semi- and fully-automated creation and management of cost-efficient pay-per-click campaigns with limited budget. The outcome of this work is a fully functional prototype that implements the proposed methodology and deals successfully with the problem. The system is experimentally evaluated on real world AdWords campaigns and shows a promising behaviour with regards to campaign success statistics.