Periodic Reporting for period 2 - SPRINT (Semantics for PerfoRmant and scalable INteroperability of multimodal Transport – SPRINT)
Reporting period: 2020-01-01 to 2021-02-28
The SPRINT project tackled the problem of creating and maintaining the IF as a performant and scalable distributed system, defining its architecture and its support tools based upon the requirements provided by the other IP4 projects.
The last year of the project was dedicated to the refinement of the SPRINT IF, consolidating the implementation and improving its performances and scalability in:
• easing how different actors can agree on common conceptual models expressed as machine-readable ontologies by letting them work together in a consistent way (ontology engineering process);
• easing the process of making an actor’s services interoperable with existing services offered by other actors, by helping finding similarities between different data models and by streamlining the conversion mechanisms (data mapping and conversion execution);
• easing the process by which an actor advertises data and services according to a commonly-agreed (or regulated) governance process in such a way that can be discovered and used unambiguously (asset management).
The architecture defined the IF as a set of interconnected tools and services, all of them based on the assumptions of using Semantic Web technologies:
• the Converter architecture, which has been designed around the concept of “conversion pipeline” which is commonly used in ETL systems. Flexibility has been taken into account by providing the possibility to use different strategies for converting data into RDF and to extract structured data from RDF data.
• the Asset management solution, which features state-of-the-art governance management and can enforce complex processes orchestrating human interactions and services. By integrating both a BPMN solution and a continuous integration and deployment (CI/CD), our lifecycle processes support a wide array of automation possibilities (i.e. generating documentation for ontologies or converting datasets into different formats / specifications exploiting converters). We studied also the possibility to use this component as a “configuration server” for Converters, which were able to dynamically obtain from the Asset Manager mappings and other resources required to perform message conversions between different standards/specifications;
• the Distributed SPARQL Endpoint, which provides a uniform querying interface for distributed data leveraging on data virtualisation and query rewriting.
• the Collaborative ontology engineering tool, which enables the usage of shared code repositories as the basis for collaborative editing of ontologies. To ease the understanding of complex models, the tool is able to re-use state of the art automatic ontology representation and documentation tools to provide editors with up-to-date diagrams and human-readable ontology descriptions.
• the Mapping suggestion tool, whose aim is to shorten the time required to create mappings between data models and ontologies. It does so by applying machine learning techniques to suggest possible mappings, letting users create complex ones, with the ability to generate Java annotations as defined by the SPRINT project as the mapping outcomes.
The development process undertook two iterations of design, implementation and testing. Testing was performed considering both the functional requirements implied by a set of real-world scenarios defined in the project in accordance with the Connective project, and performance and scalability requirements.
In the second year of the project, SPRINT focused on improving its results in the following areas:
• Improved effectiveness in the management of the main ontological assets of the IF, by means of appropriate collaborative ontology engineering methods and appropriate new tooling.
• Improved effectiveness in the development of ontologies, using non-ontological resources as a starting point with the aim of avoiding starting from scratch (i.e. SPRINT will develop mechanisms to then match and merge the resulting ontologies with the S2R reference ontology).
• Improved automation of semantic service integration, by reducing the required manual effort, particularly in the “annotation” process whereby the semantics specified in the ontology is associated with heterogeneous data (syntactic) representations of common domain entities and properties.
• Added ability to perform efficient distributed semantic query processing and semantic “linking” of data distributed across the world wide web, thus reducing the need for data and service replication.
Moreover, the SPRINT project explored the possibilities offered by NAPs for Multimodal Transportation compliant with Commission Delegated Regulation (EU) 2017/1926. The concept of a national repository of transport-related information clearly overlaps with the IP4 Shift2Rail ecosystem. Both systems must provide a catalogue of datasets and services, also defining metadata standards to ease information discovery, and must allow publishing such information according to a well-defined set of rules. Since those requirements are the same of the IF Semantic Assets Manager provided by SPRINT, the project successfully extended the Asset Manager to act as a “NAP companion” which is able to obtain and merge metadata coming from several NAPs, and which is able to automatically convert uploaded datasets thus paving the way to automatic compliance with the data formats and standards imposed by the regulations.