Periodic Reporting for period 2 - SLIPO (Scalable Linking and Integration of Big POI data)
Reporting period: 2018-07-01 to 2019-12-31
The value and impact of POIs is reflected in the complex, expensive and labor-intensive effort required for their production and maintenance, which inherently involves stakeholders and users throughout their value chain. Their initial production involves field-work, constant monitoring for their evolution and accuracy, integration of user-feedback mechanisms for reporting errors, quality assurance of new data, and roll-out across a plethora of services and products. In the POI market, the competitive advantages of data providers are clear and measurable: the greater the size, timeliness, richness, and accuracy of data, the better. The value chain of POI data has rapidly changed, with new data sources of even greater volume and heterogeneity, introducing opportunities for growth, but also complexity, intensifying the challenges for the quality-assured integration, enrichment, and data sharing of POIs.
POI data are by nature semantically diverse and spatiotemporally evolving, representing different entities and associations depending on their geographical, temporal, and thematic context. Due to their use in various domains and contexts, POI data is typically found in diverse, heterogeneous sources, from which bits and pieces of information need to be combined and assembled to increase value. However, this is hindered by the lack of common identifiers and data sharing formats. Even the means by which we typically identify and share POIs is inherently ambiguous. As a result, the integration of POI data remains labor-intensive and scalable only for domain-specific or small-scale efforts, leading to loss of information and thus lost value.
SLIPO’s objective is to deliver the missing technologies for addressing the data integration challenges of POI data in terms of coverage, timeliness, accuracy, and richness. In SLIPO, we argue that Linked Data technologies can address the limitations, gaps and challenges of the current landscape in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain.
Already validated in a pre-commercial setting, SLIPO delivers integrated POI assets with quality comparable to that manual-driven data integration. SLIPO allows users to securely manage and store their geospatial data assets, graphically design complex data integration workflows, full automate data integration, track the provenance of their POI assets, implement strict QA policies, and export their data integration results in third-party systems and products. Further, SLIPO provides a series of integrated analytics extracting added value from POIs data assets to feed decision making. Finally, the entire SLIPO platform, its data assets, workflows, and analytics, is available through Python-based Jupyter notebooks, further supporting industrial data scientists and allowing the direct exploitation of SLIPO in existing business workflows. SLIPO's main features are:
• World-scale POI data integration over heterogeneous geospatial data assets
• Fully automated as well as expert-driven definition of data integration workflows
• Secure management and provision of POI assets, integration workflows, and users
• Integrated QA services, curation, and provenance tracking
• Scalable out-of-the-box value-added analytics for POI data assets
• Support for integration with Python-based Jupyter notebooks
SLIPO's output is relevant to all economy sectors where POI data are applied. SLIPO's customer benefits are:
• Increase value, richness, quality and timeliness of your POI data assets
• Achieve practically identical integration results with expert-driven manual integration
• Integration at a fraction of the effort and cost
• Leverage proprietary and open/public geospatial data assets
• Expand products, services and workflows across EU and the world
• Cloud-based, low-cost, and pay-as-you-go pricing models
• TripleGeo was extended to support practically all industrial geospatial data formats and standards, gained support for user-defined and custom mappings, hierarchical classification schemes, and increased its performance by orders of magnitude.
• LIMES increased its scalability and effectiveness for POI data by optimizing its spatial interlinking approaches, introducing new hybrid similarity functions and configurable weighting, as well as class-expression-specific specifications for tuning proximity functions on POIs.
• FAGI was enhanced with several new fusion operators and strategies for spatial and thematic properties, metrics to assess metadata similarity and quality, and performance improvements.
• DEER has been extended with POI-specific enrichment functions, pro-active enrichment strategies, and enhancements in the execution of complex non-linear enrichment pipelines.
• SANSA has been improved with core functionalities for input data support, querying and inferencing, rule mining, and clustering.
• LOCI, a new framework for large-scale geospatial analytics over POI data has been developed, tested, and integrated as a value-added service.
• The SLIPO Workbench, a cloud-based application enabling the ad hoc integration of Big POI data assets, has been delivered, extensively tested and validated in a real-world setting.
• The SLIPO system is in production operation and commercially applied by the project partners and industrial stakeholders.