Periodic Reporting for period 1 - LBSKQ (Location Based Suggestion of Keyword Queries)
Reporting period: 2015-04-01 to 2017-03-31
The main objectives of the project are: (1) the design and development of effective LBKQS models; (2) the evaluation of the LBKQS models using real data; and (3) the development of efficient and scalable LBKQS techniques based on the most effective models.
In particular, he designed and developed a query-document (QD) graph based model, where a bipartite graph is used to model the relevance of queries to documents based on past click-stream data. The weights of this graph were adjusted to take into consideration the locations mentioned in the documents and their proximity to the user. The graph is browsed in a random-walk-with-restart fashion, i.e. using personalized PageRank (PPR), to select the keyword queries with the highest graph proximity to the user query as suggestions. In addition, he designed and developed extensions of the query-flow graph (QFG) approach and the term-query-flow graph (TQFG) approach for query recommendation to consider the locations that appear in the documents. A novel spatial proximity measure was defined between each query and the user location, based on the distance between the user location and the distribution of locations of the URLs clicked by users who previously asked the query. Finally, a model which associates the content of the documents and queries to entities in large open knowledge bases (e.g. Yago and freebase) was developed. The objective was to be able to provide effective recommendation for rare queries, which do not have past history and strong connectivity to the graph.
The Researcher managed to find and use real data for testing purposes, by processing a query log from a search engine that went public in 2006 and by constructing another dataset by linking the content of tweets to locations.
The Researcher developed approaches for improving the efficiency of location-based keyword query recommendation in two directions. Firstly, we proposed an approach that divides the nodes of the graphs to be browsed by Personalized PageRank (PPR) into groups. Using this partitioning, we apply an approximate version of similarity search based on PPR to find the queries to be suggested. Our empirical evaluation shows that this algorithm outperforms the basic PPR approach by up to an order of magnitude. In the second direction, we developed distributed and parallel techniques for speeding up recommendation approaches and graph traversal. In particular, we developed a distributed non-negative matrix factorization (NMF) technique, which is based on matrix sketching. The original matrix data are distributed to different machines, which are responsible for different sets rows and columns. The machines then independently generate the same random matrices which are multiplied with the matrix and factor matrix data that are iteratively generated in a process that converges to the final result. In addition, we used PowerWalk, which is the state-of-art distributed framework for efficient PPR computation, striking a balance between online indexing and online querying.
The results of the project were mainly disseminated by the publication of scientific papers (17 in total). In all of them, the funding source was properly acknowledged. In addition, the Researcher participated in a number of outreach activities. In particular, he gave 5 talks at national and international conferences, 4 invited talks at universities abroad, 2 talks in colloquiums organized by his Department, and 2 talks to high school students.
The dissemination activities helped in advertising the project's value and potential to third-party researchers. It is too early to account for the socio-economic impact and wider societal implications. However, we expect that its results will have high impact, once they have been adopted and tested by the community. The feedback that we have received from researchers and the public during the outreach activities is very encouraging.
The project was helped the Researcher to establish himself as a faculty member of his current institution. By the end of the project, he is successfully integrated in the local academic community. He is currently collaborating with several local colleagues and graduate students, while he has continued working with researchers abroad."