Skip to main content

Web Graph: Learning Models for Prediction and Evolution Monitoring

Article Category

Article available in the folowing languages:

Untangling the Web

With the explosion of Web-based information, there is an ever-increasing need for methods that more quickly and accurately help users get the information they want and need. An EU-funded initiative has resulted in innovative advances in data collection techniques for use in Web searches, social networking and data mining for automated Web marketing, among other applications.

Digital Economy

The ‘Web graph: learning models for prediction and evolution monitoring’ (Lempem) project was designed to develop better methods for predicting rankings of a Web page, for evaluating connections and interactions among communities of information (social networking, citations, etc.) and for automating pay-per-click (PPC) Web advertising campaigns. When carrying out a Web-based information search, one is presented with a huge list of matching pages ranked according to best match first. Of course, the best match according to the search engine may not always be the best match according to the user. Thus, better ways of predicting rankings are key to efficient use of the Web. The Lempem project developed a novel way of using past rankings to predict the ranking position of a Web page based on a combination of polynomial regression, expectation-maximisation (EM) clustering and a new similarity measure developed for the project. The predictions produced were quite accurate and robust when tested on real world, large-scale datasets. Looking at interconnections and cohesion among so-called community subgraphs is important to advancing the usefulness of social networking platforms and citation databases among others. The Lempem project applied k-core algorithms to evaluate the collaborative nature of communities, an important concept not addressed by conventional community evaluation measures. The k-core algorithms produced quite interesting results when applied to the DBLP (Digital Bibliography & Library Project) database. Companies and research institutes spend quite a bit of time and money every year for focused marketing and advertising campaigns based on Web preferences. The Lempem project succeeded in producing a prototype of semi- and fully-automated creation and management of PPC campaigns, providing a significant cost reduction and potentially increasing the competitiveness of small and medium-sized enterprises (SMEs) lacking the funding to recruit advertising specialists. In summary, the Lempem project produced innovative advances related to efficient use of Web-based information collections. Given the overwhelming amount of information and the seemingly innumerable connections among the various Web pages, the Lempem project has broad significance in making the information more useful to individual users and companies alike.

Discover other articles in the same domain of application