Skip to main content

Resource Bounded Graph Query Answering

Periodic Reporting for period 3 - GRACE (Resource Bounded Graph Query Answering)

Reporting period: 2018-11-01 to 2020-04-30

Graphs are a ubiquitous model to represent objects and their relations, such as social networks, transportation networks, telecommunication networks, the World Wide Web, biological networks, transportation systems, epidemic networks, chemical networks, knowledge bases, and hidden terrorist networks . The need for querying big graphs is evident in social media marketing, knowledge discovery, route planning, mobile network analysis, computer vision, the study of adolescent drug use, and intelligence analysis for identifying terrorist organizations, among other things.

Querying big graphs has introduced a number of challenges, from fundamental problems to practical techniques. It demands a departure from the traditional query evaluation paradigm. Vital to any systems for querying big graphs are a number of technical questions. What graph patterns should we support to query big graphs? How can we identify associations of entities in real-life graphs? What queries are BD-tractable, i.e. “tractable” on big graphs? What queries are parallel scalable, i.e. guarantee to reduce running time when more resources (processors) are used? How can we make queries BD-tractable? What parallel model should we use to query big graphs? When exact query answers are beyond reach in big graphs under constrained resources, can we compute approximate answers with accuracy guarantees? Is it possible to support all these techniques in a query system? What emerging applications can the system help? Can we extend the techniques to big relations, beyond graph queries?

In response to the challenges, this project aims to extend the conventional query paradigm, establish methodological and algorithmic foundations, and provide effective resource-constrained techniques for efficiently querying big real-life graphs and relations.
The ERC project has been fruitful in the past 30 months. We have (1) made encouraging progress
on all of its 6 work packages (WPs), (2) published 26 papers in international database conferences and
journals, all refereed, including 7 invited papers; (3) received three awards: (a) the Best Paper Award
for SIGMOD 2017, the premier international database systems conference; (b) the Best Demo Award
for VLDB 2017, the leading international all-round database conference; and (c) 2017 ACM SIGMOD
Research Highlight Award; and (4) developed two functional prototype systems as proof of concept; both
systems have been evaluated in industry and have proven effective.

We have made progress on each and every of its six work packages (WPs), summarized as follows
please see detailed report).

1. WP1: (1) a language for graph pattern queries that supports first-order logic and counting quantifiers [21],
(2) an extension of association rules from relations to graphs [20], (3) dependency languages for specifying
the semantics of graph-structured data, extending classical functional dependencies and keys from relations
for graphs [11, 17, 22]; and (4) an extension of graph functional dependencies [22] by supporting built-in
comparison predicates and linear arithmetic expressions [16], to catch numeric inconsistencies in graph-structured data.

2. WP2: (1) a theory of bounded evaluation for achieving BD-tractability [1, 3, 4]; (2) two characterizations
of the effectiveness of incremental graph computations [14], to make big graphs small, (3) several
parallel scalable algorithms for big graph analytics [11, 20, 21, 22, 16]; (4) axiom systems and complexity
bounds for analyzing graph dependencies [16, 17]; and (5) parallel algorithms for discovering and
reasoning about graph functional dependencies [13, 15], for optimizing graph queries, among others.

3. WP3: (1) an effective syntax of relational algebra as a principled approach toward bounded evaluability [1],
(2) techniques for bounded query evaluation using views, for both big relations [3, 4] and
graphs [19], and (3) a new programming model for parallel graph computations [10, 23, 25], and (4) a
new parallel model for querying big graphs [18], which subsumes the state-of-the-art synchronous (e.g.
BSP) and asynchronous parallel models (e.g. ASP) as special cases.

4. WP4: a new data-driven approximation schema for querying big relations with bounded resources, the
first one that guarantees a deterministic accuracy bound for unpredictable SQL queries [2].

5. WP5: (1) GRAPE, a functional prototype system for querying graphs [24] based on the programming
model and parallel model of WP3; and applications of querying big graphs in (2) social media marketing [20, 21],
(3) knowledge base enrichment [11], (4) inconsistency and spam detection [16, 22], and
(5) virtual network mapping [5].

6. WP6: (1) BEAS, a functional prototype system for querying big relations [6, 7] based on the bounded
evaluation theory of WP2, and (2) techniques for improving the quality of big relations [8, 26].

The project has generated 26 publications so far, among which 20 are in top-ranked journals or major
international database conferences, 7 are invited papers, and 3 received awards, the premier international
database systems conference.

We list the publications below, all refereed

• Thirteen journal publications [2, 4, 5, 7, 8, 9, 10, 11, 12, 19, 20, 23, 26], including 2 in TODS [4, 8], 2 in
TKDE [19, 26], and 3 in PVLDB [2, 11, 20]. Among these 7 are invited papers [4, 5, 7, 9, 10, 12, 19].

• Thirteen papers in the proceedings of major international database theory and system conferences,
including 9 papers in SIGMOD [1, 6, 13, 14, 16, 18, 21, 22, 25] (8 research papers and 1 demo), 2 in
PODS [3, 17], 1 in ICDE [15], and 1 VLDB demo [24].

• In particular, the output from the project has received three awards: the Best Paper Award for SIG-
MOD 2017, the premier international database systems conference, the Best Demo Award for VLDB
2017, the leading international all-round database conference, and 2017 ACM SIGMOD Research High-
light Award.

The publications are listed as follows.

[1] Y. Cao and W. Fan. An effective syntax for bounded relational queries. In SIGMOD, pages 599–614, 2016.

[2] Y. Cao and W. Fan. Data driven approximation with bounded resources. PVLDB, 10(12):1889–1892, 2017.

[3] Y. Cao, W. Fan, F. Geerts, and P. Lu. Bounded query rewriting using views. In PODS, pages 107–119, 2016.

[4] Y. Cao, W. Fan, F. Geerts, and P. Lu. Bounded query rewriting using views. ACM Trans. Database Syst., 43(1):6:1–6:46, 2018.

[5] Y. Cao, W. Fan, and S. Ma. Virtual network mapping in cloud computing: A graph pattern matching approach. Comput. J., 60(3):287–307, 2017.

[6] Y. Cao, W. Fan, Y. Wang, T. Yuan, Y. Li, and L. Y. Chen. BEAS: Bounded evaluation of SQL queries. In SIGMOD (demo), pages 1667–1670, 2017.

[7] Y. Cao, W. Fan, and T. Yuan. Is big data analytics beyond the reach of small companies? Analysis and Knowledge Discovery, 1(9):1–7, 2017.

[8] T. Deng, W. Fan, and F. Geerts. Capturing missing tuples and missing values. ACM Trans. Database Syst., 41(2):10:1–10:47, 2016.

[9] W. Fan. Data quality: From theory to practice. SIGMOD Record, 44(3):7–18, 2015.

[10] W. Fan, Y. Cao, J. Xu, W. Yu, Y. Wu, C. Tian, J. Jiang, and B. Zhang. From think parallel to think sequential. SIGMOD Record, 2018.

[11] W. Fan, Z. Fan, C. Tian, and X. L. Dong. Keys for graphs. PVLDB, 8(12):1590–1601, 2015.

[12] W. Fan and C. Hu. Big graph analyses: From queries to dependencies and association rules. Data Science and Engineering, 2(1):36–55, 2017.

[13] W. Fan, C. Hu, X. Liu, and P. Lu. Discovering graph functional dependencies. In SIGMOD, pages 427-4392018.

[14] W. Fan, C. Hu, and C. Tian. Incremental graph computations: Doable and undoable. In SIGMOD, pages 155–169, 2017.

[15] W. Fan, X. Liu, and Y. Cao. Parallel reasoning of graph functional dependencies. In ICDE, 2018.

[16] W. Fan, X. Liu, P. Lu, and C. Tian. Catching numeric inconsistencies in graphs. In SIGMOD, pages 381-393, 2018.

[17] W. Fan and P. Lu. Dependencies for graphs. In PODS, pages 403–416, 2017.

[18] W. Fan, P. Lu, X. Luo, J. Xu, Q. Yin, W. Yu, and R. Xu. Adaptive asynchronous parallelization of graph algorithms. In SIGMOD, pages 1141-1156, 2018.

[19] W. Fan, X. Wang, and Y. Wu. Answering pattern queries using views. IEEE Trans. Knowl. Data Eng., 28(2):326–341, 2016.

[20] W. Fan, X. Wang, Y. Wu, and J. Xu. Association rules with graph patterns. PVLDB, 8(12):1502–1513, 2015.

[21] W. Fan, Y. Wu, and J. Xu. Adding counting quantifiers to graph patterns. In SIGMOD, pages 1215–1230, 2016.

[22] W. Fan, Y. Wu, and J. Xu. Functional dependencies for graphs. In SIGMOD, pages 1843–1857, 2016.

[23] W. Fan, J. Xu, X. Luo, Y. Wu, W. Yu, and R. Xu. GRAPE: Conducting parallel graph computations
without developing parallel algorithms. IEEE Data Eng. Bull., 40(3):30–41, 2017.

[24] W. Fan, J. Xu, Y. Wu, W. Yu, and J. Jiang. GRAPE: Parallelizing sequential graph computations
(demo). PVLDB, 10(12):1889–1892, 2017.

[25] W. Fan, J. Xu, Y. Wu, W. Yu, J. Jiang, B. Zhang, Z. Zheng, Y. Cao, and C. Tian. Parallelizing
sequential graph computations. In SIGMOD, pages 495–510, 2017.

[26] S. Ma, L. Duan, W. Fan, C. Hu, and W. Chen. Extending conditional dependencies with built-in
predicates. IEEE Trans. Knowl. Data Eng., 27(12):3274–3288, 2015.
We have proposed a new parallel model for graph computations [25]. It allows one to plug in existing sequential graph algorithms and automatically parallelizes the computation, without recasting the algorithms into a new model. This makes parallel graph computations accessible to a large group of users. Moreover, it guarantees convergence at correct answers as long as the sequential algorithms plugged in are correct. The work received the Best Paper Award for SIGMOD 2017, the premier international database systems conference.

As a proof of concept, we have developed GRAPE, a parallel GRAPh Engine for graph computations, based on our programming and parallel models developed in [25]. A preliminary implementation of GRAPE was demonstrated at VLDB 2017 [24], a leading all-round international database conference, and received the Best Demo Award, given by a committee consisting of mostly industry people.

The work was selected to receive 2017 ACM SIGMOD Research Highlight Award.

In addition, BEAS [6], another our prototype system we have developed for querying big relations, has been deployed and evaluated at Huawei Technologies, the largest telecommunications equipment and services provider in the world. It has been verified that our bounded evaluation techniques improve the performance of Huaweis query engines by orders of magnitude. As a consequence, Huawei has invested in a joint research lab at Edinburgh, at the level exceeding 1M euro per annum. This is the first research lab funded by Huawei that is dedicated to open research. See