CORDIS - Wyniki badań wspieranych przez UE

dEUdil: Building on open data as a new business model in the business information industry

Periodic Reporting for period 2 - dEUdil (dEUdil: Building on open data as a new business model in the business information industry)

Okres sprawozdawczy: 2017-08-01 do 2018-07-31

"DueDil's vision is to be the fuel of a more informed and connected economy. We aim to achieve this by becoming the world's largest source of private company information. There has been much commentary in recent years about the perceived benefits of ""Big Data"", however, though we recognise the value of access to new types of data, the sheer volume of it is making data commoditised. There is no value in the data itself; it is the context that drives the insight and the value. Therefore, we are aiming to create the global Business Information Graph (BIG) in order to help businesses find opportunities and mitigate risks with the richest information on private companies. Our mission aligns closely with the vision of the European Common Market. While a unified currency, the dissolution of trade barriers and harmonised regulatory regimes all go a long way towards reducing friction in both intra and inter-State commerce, it is the information asymmetry or, at times, a total lack of adequate information on private companies that continues to hamper these efforts. This is the problem that we are looking to solve in Europe and globally in order help foster greater, sustainable economic growth."
In the first year of our project we have successfully expanded and optimised our infrastructure to be able to handle a 10x increase in processing and compute requirements triggered by our expansion into 28 additional jurisdictions (17 that are covered by this grant). We have also expanded our web-platform product to cover the additional jurisdictions via a version of our Advanced Search company segmentation tool as well as the company and director profiles. Our customers are now able to search for companies and view information about them and their directors in 30 jurisdictions, including 25 in Europe. We have also extended our Group Graph feature to all the companies on our platform, with users being able to easily examine complex corporate group structures all the way to the Global Ultimate Parent, regardless of whether the parent/subsidiary companies are located in a jurisdiction that we explicitly cover at the present time. We have also created company-URL and keyword data sets and are working on collecting and creating other data to enrich these profiles in our drive to create the global Business Information Graph.
"I. Progress beyond state-of-the-art

We feel that two of our technological advancements in the first year of the project place us beyond the state-of-the-art: the ability to serve large network graphs efficiently and location search

A) Constructing and serving large network graphs via a graphical web user interface (""Bacon"" micro-service)

In working to expand the group graph to the additional countries it quickly became apparent that storing it like we traditionally did would be extremely inefficient. Our old version of the graph was stored a way that forced us to be parse and re-build it as a tree, or other relevant data structure, before any operations, such as traversals, could be performed on it. This is acceptable for small graphs, but not at the scale that we have acquired. We now have graphs of up to twenty times bigger than our previous biggest graph, containing hundreds of thousands of nodes instead of hundreds. The solution was to model and store the data already as normalised as possible, in order to preserve the representation of links and nodes such that a section of any graph can be retrieved without needing to parse a massive connected graph. This also has the additional benefit that operations are faster and less data has to be moved around.

Furthermore, it also became apparent that serving it like we traditionally did would not be feasible. The previous version was always a view from the UK & Ireland and thus ‘exploded’ after being infused with views from the additional countries – this means the connections between nodes got denser. Serving such dense information is resource intensive in multiple ways: compute power, network and memory. To alleviate the density of data served as well as increase the speed of serving this data, while also minimising resource usage, there needed to be a layer that can quickly and efficiently retrieve part or all of a given company graph. We implemented such a layer and it allowed us to query solely the parts of interest for a given company graph, such as “Path to ultimate parents” or “2 levels deep of subsidiaries”. This layer is fast, generic and composable – various types of custom extraction patterns can be added to it easily.

The visual graph, displayed on the website, also got exponentially bigger and thus needed a new way of displaying the user interface in order for it to be comprehensible. Due to the data structure changes and increased density of the graphs we had to develop a new paradigm for loading the graph. This new methodology, unlike the old one that received the whole graph and displayed it as is, initially shows the most important part of the graph “Path to ultimate parents and 1 level deep of subsidiaries” while incrementally, and intelligently, loading the other parts of the graph in the background so that when a user expands a node the experience is seamless and there is no network lag.

B) Using geo-location as a search/segmentation facet

The location search feature will enable users to search seamlessly for companies across multiple countries and jurisdictions. This is challenging to do for a single country and even more so for multiple jurisdictions. Different levels of granularity in both address data and geographical classification of administrative areas mean that it's hard to present a unified geographic framework for search. To overcome this barrier we did a lot of work cleaning up and standardising addresses for various jurisdictions so that we can come up with suitable geo-coding for them, then we worked with Open Street Map data to try and map those geo-coded locations to searchable administrative regions that have reasonable granularity across the globe.

II. Expected results by the end of the project

By the end of our two-year project we expect a product that covers the majority of relevant European markets that both public and private sector customers will be able to use for purposes of lead generation, due diligence, verification and risk analysis. We aim to continue our efforts of surfacing additional data in order to enrich the corporate and director profiles and expand our global Business Information Graph. We firmly believe that our product will greatly contribute to the removal of existing information barriers to greater financing of SMEs and facilitate European as well as global trade and commerce."