Periodic Reporting for period 2 - ALIGNED (Aligned, Quality-centric Software and Data Engineering)
Reporting period: 2016-08-01 to 2018-01-31
In our approach, the categorisation and description of data is considered to be a primary activity from which software tools can be automatically generated if quality controls can be maintained. Linked Data is used as a unifying technical foundation which allows not only the domain data to be described, but also the process of tool use and tool integration. This enables continuous improvement and ongoing customisation of the software and data-model in close tandem.
Recent years have seen an explosion in interest in statistical analysis of large datasets in a wide variety of application domains. Yet the tools to apply such analysis remain basic, expensive and difficult to use. The ALIGNED project includes four use cases covering diverse domains. Seshat involves teams of social scientists building datasets which describe historical societies, while DBpedia, the hub of the web-of-data is building a high-quality general purpose encyclopaedia of structured data. The other two use-cases, Wolters Kluwer’s Jurion legal information system and the Semantic Web Company’s PoolParty product, cover commercial situations, a software user and a software developer respectively.
The concrete objective of ALIGNED is to demonstrate that the tools and methods that we develop can be integrated into the work-flows of all of these four use cases in such a way that they produce measurable improvements in terms of productivity, quality and agility. Achieving this across our diverse use cases will demonstrate the general utility and significance of our innovations.
ALIGNED has extended Oxford’s Model Catalogue and Semantic Booster tools to allow them to integrate seamlessly with linked data technologies. Leipzig’s RDFUnit has been extended to provide a new suite of quality control measures and integrated with the popular JUnit testing suite. The Semantic Web Company have extended their PoolParty platform with a new consistency module and developed a new unified governance tool to integrate administration across their software and data teams. Trinity College Dublin have developed several new services for their Dacura platform, including the Dacura Quality Service, offering real time validation of complex data changes, and extended their SUMMR tool to support integration with RDFUnit. All of these are available as open source software through our website.
ALIGNED has created a suite of 11 new ontologies, describing domains ranging from temporal uncertainty to enterprise software development, and contributed to 4 other ontologies as part of standardisation efforts. We have also published two large datasets. All of these results have been made available with open source licenses. In several cases our results have been incorporated into evolving W3C standards – in particular the Data Quality Vocabulary (DQV) and Shapes Constraint Language (SHACL) incorporate the fruits of our labours in several places.
Our work has been scientifically validated through 18 distinct peer-reviewed publications: 10 of were in respected journals or high-impact conferences. We expect this to improve further: our work on the Seshat use-case has produced results which are being prepared for submission to both Science and Nature.
A considerable effort has also been expended on providing the infrastructure to support both internal and external communication and collaboration. In addition to organising 6 well attended project meetings, we have established a website, secure storage are, wiki-based collaboration platform, mailing lists, online meeting software and dissemination channels on a variety of social media platforms: Facebook, Twitter, Youtube, Slideshare, Flickr and kept a steady stream of content flowing – 16 videos have already been published.
Most significantly, our results have been deployed in the production systems of all four of our use-cases, including both commercial scenarios. Wolters Kluwer’s production Jurion system has been using ALIGNED technology (RDFUnit) since 2015 and the consistency validation system developed within ALIGNED is a now feature of PoolParty. In our other use cases, not only has our technology been already deployed in their production systems, but we have already collected evidence which demonstrates significant improvements to their data-quality as a direct result of our innovations.
In ensuring that our results have impact after the project ends, our major focus continues to be validating our tools and methods through demonstrating concrete improvements in productivity, agility and quality in our use cases. Our tools deployed within Wolter Kluwer’s Jurion platform can spread from there to other WKD products or be adopted by other publishing companies, if there is strong evidence for their efficacy. By making all of our tools available as open source software, we have ensured that this technology transfer can be as smooth as possible.
However, although our focus is on practical results, we still need dissemination.We have focused heavily on organising and presenting at industrial and community focused workshops. We have active participation in the EC Clusuter on Software Engineering for Services and Applications as well as broader initiatives like the Big Data Value Association. We have organised a workshop at the influential SPLASH conference and are the main organisers of the SEMANTICS conferences.