Skip to main content

Location Based Suggestion of Keyword Queries

Periodic Reporting for period 1 - LBSKQ (Location Based Suggestion of Keyword Queries)

Reporting period: 2015-04-01 to 2017-03-31

Query suggestion is a recent and important add-on feature of Web search engines (e.g. Google), which helps users to express their information needs precisely. Specifically, given the fact that the user may not be able to find appropriate keywords for her search, the system recommends to her a small set of keyword queries that are likely to match her original intention. We argue that the queries suggested to a user not only should be semantically relevant to her original query, but also should give results near the user's location, especially when these queries are expressed by a mobile user; otherwise, the suggested queries might not be of interest to the user. Therefore, the focus of this problem is to design and evaluate effective location-based keyword query suggestion (LBKQS) models. The results of this project will greatly benefit the society in different ways. Firstly, the end-user of a mobile search engine will enjoy a better search experience because the tool will consider her location in combination with her search intent to greatly improve the query suggestion results. Secondly, small companies, which are usually outranked by big ones in search engine results (and hence overshadowed by them) will have improved visibility because their locations will be equally considered by our query suggestion models. Thirdly, the project will improve the state-of-the-art of research in query recommendation and search in big graphs.

The main objectives of the project are: (1) the design and development of effective LBKQS models; (2) the evaluation of the LBKQS models using real data; and (3) the development of efficient and scalable LBKQS techniques based on the most effective models.
The Researcher thoroughly studied the state-of-the-art in keyword query recommendation and extended several location-agnostic models to consider the location of the query issuer and the locations of the data associated with the query results.

In particular, he designed and developed a query-document (QD) graph based model, where a bipartite graph is used to model the relevance of queries to documents based on past click-stream data. The weights of this graph were adjusted to take into consideration the locations mentioned in the documents and their proximity to the user. The graph is browsed in a random-walk-with-restart fashion, i.e. using personalized PageRank (PPR), to select the keyword queries with the highest graph proximity to the user query as suggestions. In addition, he designed and developed extensions of the query-flow graph (QFG) approach and the term-query-flow graph (TQFG) approach for query recommendation to consider the locations that appear in the documents. A novel spatial proximity measure was defined between each query and the user location, based on the distance between the user location and the distribution of locations of the URLs clicked by users who previously asked the query. Finally, a model which associates the content of the documents and queries to entities in large open knowledge bases (e.g. Yago and freebase) was developed. The objective was to be able to provide effective recommendation for rare queries, which do not have past history and strong connectivity to the graph.

The Researcher managed to find and use real data for testing purposes, by processing a query log from a search engine that went public in 2006 and by constructing another dataset by linking the content of tweets to locations.

The Researcher developed approaches for improving the efficiency of location-based keyword query recommendation in two directions. Firstly, we proposed an approach that divides the nodes of the graphs to be browsed by Personalized PageRank (PPR) into groups. Using this partitioning, we apply an approximate version of similarity search based on PPR to find the queries to be suggested. Our empirical evaluation shows that this algorithm outperforms the basic PPR approach by up to an order of magnitude. In the second direction, we developed distributed and parallel techniques for speeding up recommendation approaches and graph traversal. In particular, we developed a distributed non-negative matrix factorization (NMF) technique, which is based on matrix sketching. The original matrix data are distributed to different machines, which are responsible for different sets rows and columns. The machines then independently generate the same random matrices which are multiplied with the matrix and factor matrix data that are iteratively generated in a process that converges to the final result. In addition, we used PowerWalk, which is the state-of-art distributed framework for efficient PPR computation, striking a balance between online indexing and online querying.

The results of the project were mainly disseminated by the publication of scientific papers (17 in total). In all of them, the funding source was properly acknowledged. In addition, the Researcher participated in a number of outreach activities. In particular, he gave 5 talks at national and international conferences, 4 invited talks at universities abroad, 2 talks in colloquiums organized by his Department, and 2 talks to high school students.
"We have developed a successful adaptation of existing graph-based query suggestion models that performs location-aware query suggestion, as described in the project proposal. The main results so far are very promising and we are confident that they can be extended to a full-scale location-based recommendation system that can be deployed and used by the public. Our results definitely extend the state of the art in query suggestion and received praising comments by the anonymous reviewers of our papers. Some excerpts are ""The experiments are extensive. I like the experiment setup. Since there are no available testing datasets in the academic community, the authors made great efforts to simulate a testing environment. The measurement of effectiveness can be further improved by doing a user study to show the keyword suggestion with spatial context is better than suggestion without considering documents nearby, as claimed in the example of the introduction section..."", ""The presentation of this paper is fine and easy to follow. The research problem is also important. The experimental datasets are impressive..."". We have not only studied the effectiveness of LBKQS models, but also put effort in making our solutions scalable. Hence, we expect that they will be applied and used in practice. The papers produced by this project have attracted 53 citations so far (as of August 2018), 16 of which were given to our seminal IEEE TKDE 2015 paper.

The dissemination activities helped in advertising the project's value and potential to third-party researchers. It is too early to account for the socio-economic impact and wider societal implications. However, we expect that its results will have high impact, once they have been adopted and tested by the community. The feedback that we have received from researchers and the public during the outreach activities is very encouraging.

The project was helped the Researcher to establish himself as a faculty member of his current institution. By the end of the project, he is successfully integrated in the local academic community. He is currently collaborating with several local colleagues and graduate students, while he has continued working with researchers abroad."
Demonstration of current results