CORDIS - Risultati della ricerca dell’UE
CORDIS

ExtraLytics: Big Data Analytics for Real Estate

Periodic Reporting for period 1 - ExtraLytics (ExtraLytics: Big Data Analytics for Real Estate)

Periodo di rendicontazione: 2014-11-01 al 2016-04-30

For decades, we were promised that the web would become humanity's greatest database. And that all that data would be available as XML, RDF, or Web 2.0 APIs. Yet, what remains is a vast ocean of dark data, hidden behind the surface in the silos of the deep web. To unearth the potential insights hidden in this accumulated information, the hidden data (estimated to be 80% of all Web data) has to be painstakingly extracted and refined to become amenable to further analysis.
Businesses and data scientists in many verticals have recognized the tremendous value of such insights, but also the punitive cost for their collection. Data scientists spend around 80% of their time for data collection and preparation, and they really hate doing so.
In the ExtraLytics PoC, a team of five researchers from Oxford has brought technology to the market that allows every business, every policy maker, and every scientist to afford including relevant web data into their decision models, ultimately leading to better, more fact-driven decisions. The technology has been created over the past 6 years, supported primarily by European Union funding (through the DIADEM ERC Advanced Investigator grant), but also by major tech companies and the British EPSRC.
With this technology, data extraction from thousands of web sites at high accuracy has finally become feasible. ExtraLytics demonstrates conclusively that the missing ingredient for such extraction was indeed a small, possibly rather noisy bit of domain knowledge.
The key findings of the ExtraLytics PoC are that
* Verticals—places, shopping, flights, images, videos—play an ever growing role in search engine results and data markets, but have been mostly limited to manually curated (flights, shopping, places) or media-specific databases (images, video).
* In contrast to general web search, the key to effective extraction as the scale of entire verticals is knowledge about objects in the domain of the vertical and their appearance on web sites.
* With ExtraLytics it is possible to automatically induce highly accurate wrappers for entire verticals, spanning thousands of websites.

These insights have been translated in ExtraLytics into a platform that helps you make new kinds of application. You think you have an idea how to better match people and jobs? Or how to answer "what's the best Italian restaurant here" based on a user's past experiences? But you don't know where to get the necessary data on current job offers, restaurant menus, or other product offers. That's where ExtraLytics comes in: We can turn quickly provide a highly structured database of all offers or goods you are interested in, whether they come from a few websites or hundreds of them–a database you can use to build better search, better recommenders, or better analytics.

This doesn't just help you build your application faster, it also makes applications possible that previously were out of reach even for the internet giants:
* Answering product search queries outside the US, where getting all the the shops to upload their data requires a huge effort.
* Answering queries that link data from different domains: a place that's playing the latest Batman movie and I can get a good burger afterwards.
* Answering queries that link dynamic and static data: find me a cheap hotel in a low crime area.

The outcomes of this PoC have been spun-out by Oxford University into Wrapidity Ltd. This startup will continue the work on bringing the technology to market and is currently in talks with potential investors.