Question-based Analysis of Geographic Information with Semantic Queries

Projektinformationen

QuAnGIS

ID Finanzhilfevereinbarung: 803498

Projektwebsite

DOI

10.3030/803498

Projekt abgeschlossen

EK-Unterschriftsdatum 19 November 2018

Startdatum 1 Januar 2019

Enddatum 31 August 2024

Finanziert unter

EXCELLENT SCIENCE - European Research Council (ERC)

Gesamtkosten

€ 1 499 412,00

EU-Beitrag

€ 1 499 412,00

1 499 412,00

Koordiniert durch

UNIVERSITEIT UTRECHT
Netherlands

Periodic Reporting for period 4 - QuAnGIS (Question-based Analysis of Geographic Information with Semantic Queries)

Berichtszeitraum: 2023-07-01 bis 2024-08-31

Geographic Information Systems (GIS) play a crucial role in analyzing spatial data to answer questions central to geography, geosciences, and interdisciplinary applications such as health sciences and environmental modeling. For example, assessing the impact of urban green spaces on public health requires spatial analysis to model environmental contexts. However, despite its significance, GIS faces accessibility challenges in non-geographic disciplines due to its technical complexity. Additionally, the proliferation of big data exacerbates analysts' difficulty in navigating the diverse datasets and tools available on the Web, limiting the potential of spatial analysis across domains.

This project addressed these issues by advancing Geographic Question Answering (GeoQA), a cutting-edge approach in Artificial Intelligence (AI) that handles spatial information through natural language questions. While current GeoQA methods excel at retrieving facts (e.g. “Which municipalities neighbor Amsterdam?”), they fall short when addressing geo-analytical questions whose answers are not explicitly available, such as “What is the average park density accessible for pedestrians in Amsterdam?” These analytical questions require indirect answers derived from complex workflows involving data transformation, spatial reasoning, and tool integration—processes central to GIS.

The project's primary objective was to develop methods for enabling automatic retrieval and synthesis of tools, workflows and data sources for answering geo-analytical questions. To achieve this, we:

- Developed a theory of interrogative spatial concepts to formalize geo-analytical questions.
- Created computational models that make such questions machine-interpretable.
- Designed semantic frameworks to describe geospatial tools and data in terms of the questions they answer.

By pursuing these goals, we provided analysts with tools to formulate questions conceptually, enabling the automated discovery of relevant data and workflows over the Web. This paradigm shift improves accessibility to GIS for non-experts, fosters interdisciplinary applications, and enhances reproducibility and efficiency in spatial analysis. The project’s outcomes have significant implications for society. They lower the barriers to leveraging spatial data in fields like public health, urban planning, and disaster management, empowering diverse stakeholders to address critical societal challenges. Furthermore, this research advances Geographic Information Science (GIScience) by concept and theory development and by integrating AI and GIS, setting the foundation for innovative applications in spatial reasoning and AI-based knowledge generation. Ultimately, our work represents a milestone in democratizing GIS capabilities, understanding geographic information, and expanding the reach of spatial analysis across domains.

In the first six months, we conducted preparatory actions and reviewed literature on geoQA, AI, Linguistics, and GIScience. In Year 1, we performed a systematic survey of geo-analytic questions answerable with GIS workflows (WP1), generating the GeoAnQu corpus with 429 example questions from academic sources, which forms the empirical foundation and gold standard for evaluation (WP5.1).

In Year 2, we developed a geo-analytic question grammar (WP2), capturing question patterns, formalizing them based on interrogative concepts, and enabling concept transformations. The grammar was tested on the corpus and online sources, forming the basis for a query interface.

We also developed a conceptual model of information components for geo-analytic QA (WP3/5). The Core Concept Data Types (CCD) ontology enables automatic workflow composition using tools and geodata, successfully tested in GIS workflow studies. Annotations support our data/tool repository (WP4).

The Core Concept Transformation (CCT) algebra was created to interpret geo-analytical questions as concept transformations for querying workflows, and we explored amounts in Geography. CCT is being implemented as a Python library.

In Year 2, User Study 1 (WP1.3) evaluated GIS workflow design with 40 participants. Despite delays due to the pandemic, this study provided valuable insights for automation and informed the gold standard.

In Year 3, User Study 2 (WP4.2) assessed the usability and interpretability of the GeoQA grammar and Blockly interface, leading to refinements based on participant feedback. User Study 3 (WP4.3) tested human analysts’ ability to recognize core concepts in geographic tasks, validating the CCD ontology and guiding its improvement.

In Year 4, User Study 4 (WP5.2) was intended to evaluate the full technology stack, but due to incomplete prototype development, we instead conducted a cognitive map interpretation survey, essential for the project’s progress.

A retrieval and query study (Steenbergen et al., 2023) also evaluated the technology stack, testing the ability of the GeoQA grammar, CCD ontology, and CCT algebra to retrieve workflows and data based on geo-analytical questions. Results confirmed the system’s effectiveness in automating workflow composition and data retrieval.

The detailed outcomes of these studies validate the GeoQA framework’s feasibility and practical utility in advancing geographic question-answering and spatial analytics.

The project focused on foundational work, including collecting empirical data (e.g. the GeoAnQu corpus, online question sources, and user studies 1-4) and developing key conceptual models, establishing a new subfield within geographic question answering (geoQA), termed geo-analytical QA (Scheider et al. 2020). Significant progress beyond the state of the art has been achieved:

- GeoAnQu Corpus: A groundbreaking dataset of geo-analytical questions, providing a gold standard for evaluation and empirical insights into geo-analytical questions.

- Integration of Core Concept Model in GeoQA: The core concept model (Kuhn, 2012) has been applied in geoQA for semantic descriptions of geodata, automated GIS workflow composition, and grammar-based question interpretation.

- Core Concept Data Types (CCD) Ontology and Geographic Quantities: The CCD ontology (Scheider et al. 2020) describes geodata and tools, automating workflow composition and linking geo-analytical questions to executable workflows. Retrieval studies validated its effectiveness, and formal theories of geographic amounts (Top 2022) were developed to extend it.

- Geo-Analytical Grammar and CCT Algebra: A grammar was developed (Xu et al. 2023) to translate geo-analytical questions into concept transformations via the Core Concept Transformation (CCT) algebra (Steenbergen et al. 2023), laying the groundwork for automated question answering with workflows.

- QuAnGIS Prototype: A prototype integrates these scientific developments into a GeoQA system, allowing users to ask geo-analytical questions via a Blockly interface to retrieve GIS workflow suggestions.

These achievements provide a solid foundation for advancing geo-analytical QA and to automate knowledge generation in Geographic Information Science (GIScience).

QuAnGIS logo

Periodic Reporting for period 4 - QuAnGIS (Question-based Analysis of Geographic Information with Semantic Queries)

Diese Seite teilen Diese Seite in sozialen Netzwerken teilen

PDF-Datei herunterladen Den Inhalt der Seite herunterladen