CORDIS - EU research results

Programme Category


Article available in the following languages:


Big Data PPP: cross-sectorial and cross-lingual data integration and experimentation


Proposals should cover one of the following bullets:

  1. Data integration activities will address data challenges in cross-domain setups, where similar contributions of data assets will be required by groups of EU industries that are arranged along data value chains (i.e. such that the value extracted by a company in a given industrial sector is greatly increased by the availability and reuse of data produced by other companies in different industrial sectors). The actions will cover the range from informal collaboration to formal specification of standards and will include (but not be limited to) the operation of shared systems of entity identifiers (so that data about the same entity could be easily assembled from different sources), the definition of agreed data models (so that two companies carrying out the same basic activity would produce data organised in the same way, to the benefit of developers of data analytics tools), support for multilingual data management, data brokerage schemes and the definition of agreed processes to ensure data quality and the protection of commercial confidentiality and personal data. The actions are encouraged to make use of existing data infrastructures and platforms.
  2. Data experimentation incubators should address big data experimentation in a cross-sectorial, cross lingual and/or cross-border setup. This setup should include access to data in different domains and languages, appropriate computational infrastructure, and open software tools. The incubator should make these available to the experimenters, who are expected to be mainly SMEs, web entrepreneurs and start-ups. Experimentation is to be conducted on horizontal/vertical contributed data pools provided by the incubator. At least half of the experiments should address challenges of industrial importance jointly defined by the data providers, where quantitative performance targets are defined beforehand and results measured against them. Effective cross-sector and cross-border exchange and re-use of data are key elements in the experiments ecosystem supported by the incubators. Therefore, the incubators are expected to address the technical, linguistic, legal, organisational, and IPR issues, and provide a supported environment for running the experiments. To remain flexible on which experiments are carried out and to allow for a fast turn-over of data experimentation activities, the action may involve financial support to third parties, in line with the conditions set out in part K of the General Annexes. The proposal will define the selection process of the experimenters running the data activities for which financial support will be granted (typically in the order of EUR 50 000 – 100 000[[In line with Article 23 (7) of the Rules for Participation the amounts referred to in Article 137 of the Financial Regulation may be exceeded, and if this is the case proposals should explain why this is necessary to achieve the objectives of the action.]] per party). At least 70% of the EU funding shall be allocated to this purpose. Experiments are expected to run for a maximum of 6 months, while the incubator should run for a minimum of three years. The proposals are expected to explain how the incubator would become self-sustaining by the end of the funded duration of action.[[It is recommended to also use established networks reaching out to SMEs like the Enterprise Europe Network and the NCP network for calls publications and awareness raising towards SME's.]]

The Commission considers that proposals requesting a contribution from the EU of between EUR 1 and 3 million (for the data integration activities under a) or about EUR 7 million (for the incubators under b) would allow this area to be addressed appropriately. Nonetheless, this does not preclude submission and selection of proposals requesting other amounts.

Europe lacks a systematic transfer of knowledge and technology across different sectors and there is an underdeveloped data sharing and linking culture. Traditionally, data has been collected and used for a certain purpose within sectorial ""silos"", while using data across sectors for offering new services opens new opportunities for solving business and societal challenges. The lack of agreed standards and formats, and the low rates of publishing data assets in machine discoverable formats further hold back data integration. The fact that textual data appears in many languages creates an additional challenge for sharing and linking such data. Finally, there is a lack in Europe of secure environments where researchers and SMEs can test innovative services and product ideas based on open data and business data.

The challenge is to break these barriers and to foster exchange, linking and re-use, as well as to integrate data assets from multiple sectors and across languages and formats. A more specific challenge is to create a stimulating, encouraging and safe environment for experiments where not only data assets but also knowledge and technologies can be shared.

a. Data integration activities

  • Data integration activities will simplify data analytics carried out over datasets independently produced by different companies and shorten time to market for new products and services;
  • Substantial increase in the number and size of data sets processed and integrated by the data integration activities;
  • Substantial increase in the number of competitive services provided for integrating data across sectors;
  • Increase in revenue by 20% (by 2020) generated by European data companies through selling integrated data and data integration services offered.

b. Data experimentation incubators

  • At least 100 SMEs and web entrepreneurs, including start-ups, participate in data experimentation incubators;
  • 30% annual increase in the number of Big Data Value use cases supported by the data experimentation incubators;
  • Substantial increase in the total amount of data made available in the data experimentation incubators including closed data;
  • Emergence of innovative incubator concepts and business models that allow the incubator to continue operations past the end of the funded duration.