Periodic Reporting for period 2 - dEUdil (dEUdil: Building on open data as a new business model in the business information industry)
Reporting period: 2017-08-01 to 2018-07-31
We feel that two of our technological advancements in the first year of the project place us beyond the state-of-the-art: the ability to serve large network graphs efficiently and location search
A) Constructing and serving large network graphs via a graphical web user interface ("Bacon" micro-service)
In working to expand the group graph to the additional countries it quickly became apparent that storing it like we traditionally did would be extremely inefficient. Our old version of the graph was stored a way that forced us to be parse and re-build it as a tree, or other relevant data structure, before any operations, such as traversals, could be performed on it. This is acceptable for small graphs, but not at the scale that we have acquired. We now have graphs of up to twenty times bigger than our previous biggest graph, containing hundreds of thousands of nodes instead of hundreds. The solution was to model and store the data already as normalised as possible, in order to preserve the representation of links and nodes such that a section of any graph can be retrieved without needing to parse a massive connected graph. This also has the additional benefit that operations are faster and less data has to be moved around.
Furthermore, it also became apparent that serving it like we traditionally did would not be feasible. The previous version was always a view from the UK & Ireland and thus ‘exploded’ after being infused with views from the additional countries – this means the connections between nodes got denser. Serving such dense information is resource intensive in multiple ways: compute power, network and memory. To alleviate the density of data served as well as increase the speed of serving this data, while also minimising resource usage, there needed to be a layer that can quickly and efficiently retrieve part or all of a given company graph. We implemented such a layer and it allowed us to query solely the parts of interest for a given company graph, such as “Path to ultimate parents” or “2 levels deep of subsidiaries”. This layer is fast, generic and composable – various types of custom extraction patterns can be added to it easily.
The visual graph, displayed on the website, also got exponentially bigger and thus needed a new way of displaying the user interface in order for it to be comprehensible. Due to the data structure changes and increased density of the graphs we had to develop a new paradigm for loading the graph. This new methodology, unlike the old one that received the whole graph and displayed it as is, initially shows the most important part of the graph “Path to ultimate parents and 1 level deep of subsidiaries” while incrementally, and intelligently, loading the other parts of the graph in the background so that when a user expands a node the experience is seamless and there is no network lag.
B) Using geo-location as a search/segmentation facet
The location search feature will enable users to search seamlessly for companies across multiple countries and jurisdictions. This is challenging to do for a single country and even more so for multiple jurisdictions. Different levels of granularity in both address data and geographical classification of administrative areas mean that it's hard to present a unified geographic framework for search. To overcome this barrier we did a lot of work cleaning up and standardising addresses for various jurisdictions so that we can come up with suitable geo-coding for them, then we worked with Open Street Map data to try and map those geo-coded locations to searchable administrative regions that have reasonable granularity across the globe.
II. Expected results by the end of the project
By the end of our two-year project we expect a product that covers the majority of relevant European markets that both public and private sector customers will be able to use for purposes of lead generation, due diligence, verification and risk analysis. We aim to continue our efforts of surfacing additional data in order to enrich the corporate and director profiles and expand our global Business Information Graph. We firmly believe that our product will greatly contribute to the removal of existing information barriers to greater financing of SMEs and facilitate European as well as global trade and commerce.