Research infrastructure for research and innovation policy studies

Final Report Summary - RISIS (Research infrastructure for research and innovation policy studies)

Executive Summary:
The objective of RISIS is to develop a distributed infrastructure serving researchers in the field of science and innovation studies.
The underlying reason is both policy and research based. Policy because it is at least 20 years that policymakers ask questions that existing indicators and the underlying databases cannot answer (this is well explained by Godin about the limits of the production function and the input-output approach). For researchers one interpretation lies in the deepened understanding of knowledge production and innovation processes, of the circulation of knowledge and in a greater distance with the assumption of the country as the relevant unit of analysis.
The changing environment and the rise of the internet as a new communication infrastructure has further opened new possibilities for developing databases and analytical tools that improve our understanding and for developing indicators that match the theoretical state of the art. More specifically new indicators can keep the identity of actors and their fined grained location (in metropolitan areas), two of the strongest limitations of the statistics-based approach to indicators of science, technology and innovation. Lepori, Barré et al., 2008 have proposed to name these new indicators, positioning indicators.

This has generated a wealth of developments and experimental databases, quite a few being supported by the European Commission, and mostly as one-off events. A first objective is thus to stabilise and maintain over time these datasets and make them available to all interested researchers in Europe, but also abroad. We also take the duration of the project as a ‘testing’ period, which will act as a revealer of the lasting interest about these datasets and help us determine those that should be lasting, giving us also time to think about the conditions for their lasting maintenance and about their conditions of access.

However the available data cover the lasting theoretical and policy problems only partially. We have identified several key lasting issues that require specific efforts, and have thus proposed the development of new datasets on these issues. This is the second objective.

Many problems are at the interface of existing datasets and are thus linked to the ability of interfacing them and of generating a problem-based integration. This is the third objective, which requires that we care for technical and substantive harmonisation.

Whatever the number of datasets (14) currently in the project, this will only partially address the needs of researchers for robust ‘positioning’ data. Our fourth objective is to develop platforms that will help researchers build their datasets from a variety of data, such as open data available on the web, administrative data, and specific project oriented data collections, and provide them with instruments to organize and analyse these often large textual corpuses.

These four objectives combined require that we develop an overall architecture for the infrastructure so that researchers can access, build, integrate and treat data at a distance. This is a fifth objective that, though not explicitly included in the project, has become central to the dynamics of the infrastructure
Project Context and Objectives:
RISIS is thus a 4,5 years project that gathers 15 European groups involved in these issues with a quite simple agenda:
a) A first period was dedicated to prepare the opening of existing datasets (9) and platforms (2). This has been done and all existing datasets and one platform have been opened between month 18 and 24.
b) Beyond initial harmonisation between datasets, two very important ‘harmonisation’ activities have been to ensure problem-based integration. They dealt with organisational and geographical dimensions.
c) Thematic-based research work to deepen existing datasets and the existing CORTEXT platform for semantic analysis, develop 4 new key datasets and the SMS platform dedicated to accessing and enriching web-based datasets,
d) Interact with the community and with stakeholders in a view of considering the conditions for lasting existence.
e) A fifth objective was progressively added through amendments to deal with what is now labelled ‘virtual transnational access’. The objective was to design the principles and test key aspects of a computer architecture that generalises ‘distant access’ for researchers while keeping the conditionality linked to ‘peer-reviewed project based access’ (what we call conditional access).

The project has a classical structure, a Governing Board made of one representative per member for all strategic decisions; a ‘Facility Coordination Board’ made of 4 persons for managing the daily life of the project and supported by a project management team. It is supported by a 6 persons Project Review Board. It has developed internal procedures (so called activity sheets) to define and monitor individual activities (within and between work packages) and an extranet to insure full transparency of all our activities for members and for our funders. A central feature of its life is the RISIS annual week (each year last week of January: Rome in 2015, Amsterdam in 2016, Vienna in 2017, Paris in 2018) to enable deep and transversal exchanges between all participating researchers.

RISIS incorporates 15 datasets, 9 existing datasets and 6 new ones, focusing on 6 major topics of interest in our field: (i) the dynamics of public sector research (Universities & PRO), (ii) the careers of PhD holders, (iii) innovation in firms, (iv) European integration, (v) new emerging technologies, and (iv) research and innovation policies (see box below highlighting time of opening for European researchers).

RISIS also incorporates two interconnected software platforms with complementary aims: SMS (under development, first opening only on site in October 2017) for developing new datasets focusing on linked open data, and CorText: Cortext is a digital platform dedicated to the enrichment, treatment and visualisation of large textual corpuses. It offers researchers the possibility to treat on-line their corpuses and has been available under beta format for the start while the full RISIS version has been online since October 2015.

Firm innovation capacity :
* Corporate invention Board – IFRIS/UPEM – dataset on inventive activities of large multinational firms (Opened July 2015)
* VICO – POLIMI – Dataset on survival and dynamics of young high tech firms (updated version opened March 2016)
* CHEETAH – POLIMI/SPRU/UPEM new dataset on innovation in fast growing midsized firms (first opening in 2017)
* FIRMREG a register of the European firms included in the firm datasets

Scientific and Technological capabilities :
* Leiden publication database – CWTS – dataset on publications (opened since the beginning)
* Nano S&T – IFRIS/UPEM – dataset on the dynamics (thematic, institutional & geographical) of publications and patents in nano-science and technology (Opened July 2015)
* IFRIS Patstat – IFRIS/UPEM – dataset on patents (indirectly opened since 2015 through CIB).

European integration :
* JOREP – CNR – dataset on trans-border joint programmes between funding bodies/agencies in Europe (opened December 2015)
* EUPRO – AIT – longitudinal dataset on collaborative projects and partnerships promoted by European Union research programmes (Opened September 2015, periodically updated)

Universities and Research Organisations :
* RISIS ETER (USI and Joanneum incorporating to ETER data on university outputs and European projects, and reincorporating longitudinal data from previous projects (such as EUMIDA).
* RISIS OrgReg (USI and Joanneum) incorporating ETER dataset on European higher education institutions (opened July 2015) and Public Research Organisations (new dataset, opened for testing December 2016)

Research careers :
* Profile – DZHW – panel based dataset on careers of doctoral candidates in Germany (opened October 2015)
* MORE – NIFU – European survey of mobility of researchers (opened March 2016)
* PhD Careers – IFQ & CSIC - new semantic wiki enabling researchers to access information about multiple national and local PhD career datasets organised around the newly developed “research career conceptual framework”

Policy learning :
* SIPER – Science and innovation policy evaluations repository – University of Manchester (new dataset, Opened November 2016)
Project Results:
A first central results lies in the visits made. We have generated over 110 visits (out of nearly 150 projects submitted) for the datasets and the SMS platform (not on line yet by the end of the project). The CORTEXT platform for semantic analysis and visualisation has trespassed its objectives by a factor of over 50, having around 200 projects per month (while the initial objective was 30 projects per year!). An interesting marker of the position of RISIS in the field lies in the number of presentations made at the central conference for the field: 34 out of 136 at the 2017 Paris STI conference.
We derived two major lessons from the use: (a) firm datasets are in huge demand, well beyond the borders of our field. A good example lies in VICO, which is unique worldwide, as such a dataset on ‘privately owned’ firms cannot be made in the US. (b) There is an on-going debate about the use of providing access to on-line datasets: the demand for Leiden publication dataset, for EUPRO on European projects and to a lesser extent for IFRIS Patstat tells about the importance of enrichments made (geolocation, harmonisation of actors) for relevant studies to be conducted. It comforts us in our approach.
The opening of new datasets and services provides novel opportunities for researchers. Here we wish to highlight 4 major achievements:
- The CHEETAH database is the first one ever dealing with fast growing mid-sized firms, which are critical for employment in Europe. Opened in 2017, it provides a unique view of the geographical and sectoral repartition of these firms in Europe, and particularly in new member states.
- The ORGREG register enriches ETER on universities and complements it on public research organisations and university hospitals (a difficult issue for all analysts of public research). It provides a unique tool to follow dynamics and strategies at actor level. Structural transformation, in particular through mergers, is an important element of the 2000s and can be monitored through the register.
- The SIPER repository of policy evaluations enable for the first time to capitalise knowledge on instruments and on policy mixes. It is a freely accessible service for both researchers and policy analysts and has been opened since the end of 2016, being periodically enriched.
- The new CORTEXT geocoding and geoclustering service that enable geocoding large corpuses and thus develop studies at different geographical stages, and in particular at the level of metropolitan areas, a major issue about agglomeration dynamics in knowledge production.
One of our central objectives is to drive our community to shift progressively the balance between qualitative and quantitative studies, using also the latter for exploration purposes. This requires both awareness raising (which we have done through the specific activities in the STI conferences for now the last 4 years) and to develop important training activities: we consider as a major achievement with the 23 courses delivered, over 300 researchers trained from 21 countries and 4 international organisations, coming (beyond RISIS members) from 100 different organisations.
The final major achievement looks at the future. To be lasting in the present open science and internet world, RISIS as an infrastructure needs to organise generalised distant access. This is not an issue for our services (in particular the CORTEXT platform, the SIPER repository or the ORGReg register). But, due to the mobilisation of privately owned datasets (such as the WoS, Patstat or ORBIS), RISIS has developed a ‘hybrid model’, whereby researchers can only access most of our data for publishable research only. This requires that we organise distant access (i.e. ‘virtual transnational access’) based upon peer-reviewed selected projects (i.e. conditional to the selection of projects and organised in a way that they only access the data needed for the projects). This requires a quite new computer infrastructure compared to most existing e-infrastructures. We have thus designed such an architecture, and tested critical components (in particular the conditional access based on oAuth protocols). We are thus in a position, should the project be selected as an advanced community, to move fast to ‘generalised distant access’.
Potential Impact:
Our core stakeholders are policy analysts (in government and other public structures) and policymakers. We face a dual difficulty in organising the dissemination and exploitation of our results. First the role of RISIS is to provide facilities for analysts to produce indicators and results by mobilising the data provided. RISIS is thus not a direct producer of indicators and of results supporting evidence-based policies. Second the way results are taken up in policy is not linear. The discussion on the topic started long before the RISIS project (in particular in the previous PRIME network of excellence). It highlighted that diffusion is not linear: it is not because there is an interesting result that it will be taken up in policymaking. We identified then a ‘percolation model’, whereby ideas and principles have to circulate and penetrate policy circles before targeted results may be incorporated in policy. In view of these two difficulties we have developed the following approach and activities.
First we have made a difference between policy analysts and policymakers. Policy analysts often participate in the conferences of the field (and there are also specialist policy oriented conferences, such as the periodic Vienna evaluation conferences co-organised by members of the RISIS consortium). So our efforts towards the research community also enable to raise the awareness of policy analysts towards the datasets and tools available (a good example are the links developed with the JRC around SIPER). And we decided to widely open our training courses to all policy analysts with a clear success (as they build up over 10% of total participation).
Second, we decided to engage with international organisations, and in particular OECD, where we were invited a number of times to present RISIS to Government representatives, and to discuss the possible uses of semantic treatments in policy analysis (Paris 2018). We even co-organised a specific session on the latter topic at the large ESOF conference in Toulouse (July 2018).
Our third choice has been to build demonstrators that show the policy interest of our datasets. We have developed two ways to circulate them. The first one lies in the organisation of workshops targeted to very specific audiences and/or focused on well-identified policy defined problems. We did such a workshop for services of the European Commission in October 2017 and plan another one targeted on three aspects of interest for Horizon Europe before the end of 2018.
The other way to diffuse our demonstrators is to summarise the results arrived at in policy briefs. We have produced a first round of 9 policy briefs, while more are underway after the end of the project.
