Data without Boundaries
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
Rue Michel Ange 3
€ 1 413 624
Franck Charron (Mr.)
Sort by EU Contribution
€ 161 849
UNIVERSITAT ROVIRA I VIRGILI
€ 107 090
UNIVERZA V LJUBLJANI
€ 38 391
GESIS-LEIBNIZ-INSTITUT FUR SOZIALWISSENSCHAFTEN EV
€ 589 303
NORSK SAMFUNNSVITENSKAPELIG DATATJENESTE AS
€ 419 075
€ 196 502
UNIVERSITATEA DIN BUCURESTI
€ 101 865
INSTITUT FUER ARBEITSMARKT- UND BERUFSFORSCHUNG (IAB) DER BUNDESAGENTUR FUER ARBEIT
€ 511 548
UNIVERSITAT POLITECNICA DE CATALUNYA
€ 88 782
CENTRAAL BUREAU VOOR DE STATISTIEK
€ 292 869
Statisticni Urad Republike Slovenije
€ 86 422
ETHNIKO KENTRO KOINONIKON EREVNON
€ 43 641
CENTRO DE INVESTIGACIONES SOCIOLOGICAS
€ 36 788
€ 341 702
METADATA TECHNOLOGY LTD
€ 225 930
FONDATION SUISSE POUR LA RECHERCHE EN SCIENCES SOCIALES
€ 141 240
UNIVERSITY OF ESSEX
€ 329 164
KONINKLIJKE NEDERLANDSE AKADEMIE VAN WETENSCHAPPEN - KNAW
€ 121 654
UNIVERSIDAD DE LA LAGUNA
€ 62 690
AGENCIA ESTATAL CONSEJO SUPERIOR DEINVESTIGACIONES CIENTIFICAS
€ 86 752
UNIVERSITY OF SOUTHAMPTON
€ 3 892,02
INSTITUT NATIONAL DE LA STATISTIQUE ET DES ETUDES ECONOMIQUES
€ 63 643,40
UK Office for National Statistics
€ 408 393
CENTRO DE ESTUDIOS DEMOGRAFICOS
€ 199 980
€ 72 310
CENTRUL NATIONAL DE PREGATIRE IN STATISTICA
€ 27 820
GROUPE DES ECOLES NATIONALES D ECONOMIE ET STATISTIQUE
€ 231 734,60
THE UNIVERSITY OF MANCHESTER
€ 88 362,98
Grant agreement ID: 262608
1 May 2011
30 April 2015
€ 8 686 425,56
€ 6 493 017
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
Enhanced transnational access to confidential data on EU citizens, households and businesses
Grant agreement ID: 262608
1 May 2011
30 April 2015
€ 8 686 425,56
€ 6 493 017
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
Final Report Summary - DWB (Data without Boundaries)
Data collected by National Statistical Institutes (NSIs) and other government bodies are crucial resources for the society and research. These Official Statistics (OS) include information on individuals, households and firms (the so-called ‘microdata’) enabling researchers to investigate the specificities of communities and their reactions, e.g. to economic and policy changes. While an increasing number of countries have set conditions allowing easier access to larger parts of these microdata, transnational access is still uneven, notably for the highly-detailed (or so-called ‘confidential’) microdata increasingly required for cutting-edge research. Approaches of statistical disclosure confidentiality have evolved across Europe by notably including higher level of anonymisation, which can affect the usefulness of the data for research. Thus countries are adopting new technologies to allow researchers to securely process such data; though with differences in legal frameworks, standards, procedures and equipment. As a result of this diversity, the scope of transnational access to official microdata varies across Europe. Positions diverge at all steps, from the researchers’ accreditation to the technical modalities. Information about data is still fragmented or unavailable. While transnational access remains a challenge, combining data from several countries raises even more difficulties. Keeping this sensitive data secure while facilitating cross-border access for legitimate research purpose is a key priority.
DwB has gathered 28 partners with the aim to alleviate barriers for accessing OS data for research purpose, notably for transnational research, in this fragmented European landscape. After those 4 years, the project has issued guidelines, recommendations, roadmaps, concepts, actual tools and prototypes that feed in a vision for the infrastructure that is needed in that prospect. This crucially requires keeping the momentum of the close collaboration links initiated within DwB between the European Statistical System (ESS) led by Eurostat, CESSDA, and the research community, that has been built up through a series of activities and events, to promote greater harmonisation of rules and practices by building trust and fostering dialogue.
DwB has demonstrated the potential of the legal frameworks for transnational access to confidential microdata also setting up a visualisation tool making obvious how countries differently interpret similar frameworks; the concept of “circle of trust”, based on equivalences in principles and procedures ensuring security, and agreements between partners regarding responsibilities, has been proposed by the project with parallel discussions within institutional circles as the OECD one, as the path for moving forward to transnational access within these legal frameworks. Adoption of an ISO27001-compliant security standard in RDCs for which DwB has provided guidelines will facilitate agreements and further harmonisation of accreditation for which DwB has designed a roadmap. These provide the global framework necessary for the development of a European Remote Access Network (EuRAN), which would work as a backbone for European research by allowing researchers to work together with confidential data sources located in different RDCs across Europe.
DwB has not only built a conceptual framework and discussed the possible components for such a EuRAN that will match researchers' needs while fitting the Research Data Centres' (RDCs) and OS producers’ constraints for security. It has also shown that it is technically, organisationally and legally achievable. A first step was made toward establishing the infrastructure needed with real implementation of a bilateral transnational remote access from UKDA secure centre to the German IAB confidential microdata based on an RDC-in-RDC approach with a Memorandum of Understanding establishing partners’ responsibilities that could work as template for others. A second step was an IT implementation of highly secure IT connexions with 3 nodes in RDCs located in France and Germany and a central node as a Proof-of-Concept for a real network that goes further than parallel bilateral connexions,. The lessons learnt with the TNA WPs experience that have provided support to researchers for transnational access to 7 RDCs in 4 countries, from the viewpoints of users and RDCs involved, are also regarded as major contributions in that prospect. The SDC software packages, the developments regarding linkage issues for confidential microdata, and the Guidelines for Output Checking also serve as a basis for developing and harmonizing methods and sharing best practices, thus facilitating transnational access to confidential microdata while also easing the production of Scientific Use Files (de facto anonymized microdata), Public Use Files and Campus Files.
The proposed European Service Centre for Official Statistics (ESCOS), envisioned as a global specific service for the researchers in the field of OS, ideally as a CESSDA sub-unit, should provide the front office for such a EuRAN. It should not only be an easy single point of access for official microdata, both national and European, providing basic information of what is available and where, information and support on accreditation and access procedures, and friendly metadata translated in English; but also capture and pool the routines and data discoveries generated by researchers for future use while offering an umbrella for maintaining and developing cooperation between the communities, with specific activities like staff visits, training and conferences as those successfully organized within the timeline of the projects, the Users’ conference bringing together the researchers and the producers, the Regional workshops aiming at developing cooperation between archives and OS producers and the European Data Access Forum for discussing best practices and new challenges.
The DwB prototype for a CESSDA Resource Discovery Portal (RDP) is a first step toward creating an enhanced CESSDA portal for resource discovery from diverse sources, including OS, which should avoid information silos. It is based on an implementation model - mapping and taking into account the recent developments in current standards. It is complemented with an analysis of the end-users’ requirements and reports highlighting areas, concerns and recommendations for overcoming future metadata challenges.
DwB has built tools, CIMES (Centralising and Integrating Metadata from European Statistics) for national official microdata and MISSY (Metadata Information System) for the European (Eurostat) ones, that must be maintained and developed. It has also allowed initiating cooperation between the different stakeholders, by notably contributing to establish relations between archives and NSIs in several countries. It is hoped that some of the DwB results, particularly the basis for a future ESCOS will be implemented into the starting CESSDA's workplan. Cooperation with CESSDA, as a future ERIC, is also now brought to discussion in the ESS by Eurostat. Yet, specific developments notably regarding the construction of a European Remote Access Network in connection with CESSDA and other infrastructures as those in the health sector that deals with very sensitive data will require direct and continuous involvement of the RDCs many of them located in the NSIs and important financial investments that are out of the scope and budget of CESSDA's workplan. Further investments will therefore be crucially needed in that prospect.
Project Context and Objectives:
The Data without Boundaries (DwB) project, funded under the European Union 7th Framework Program (FP7), was conducted over 4 years, from May 2011 until the end of April 2015. It primarily aims at overcoming the main obstacles that slow down researchers' access to official microdata, too often underused, in particular for the highly detailed (so-called confidential) ones beyond each country's national borders.
A FOCUS ON TRANSNATIONAL ACCESS TO CONFIDENTIAL MICRODATA
Without disregarding those anonymized microdata that may remain difficult to access in certain cases, this project focused on highly-detailed individual data, likely to allow indirect identification of individuals or including "sensitive data" as defined by legislation concerning privacy or business confidentiality. The data at stake include, in addition to microdata, collected by the various general population censuses as well as surveys produced by the various national statistical institutes - which partly now feed into the European surveys coordinated by Eurostat- official registers and several administrative databases produced either directly by the National Statistical Institutes, by statistical departments within ministries or by government agencies with various status. All data pertaining to job-seekers, retirement pensions, as well as fiscal, banking, financial, health-related and administrative data that are increasingly required for research and also raise linkage issues fall in this area.
Only a part of these data is accessible under the form of tabulations or of anonymized microdata with the Public Use Files and the Scientific Use Files; however frequently too not enough detailed to fit researchers' needs; moreover, they are scattered on various sites, with metadata often non translated into English. When the DwB project starts, access is gradually being facilitated in Europe with the implementation of secured systems, on-site or remote distance, to more highly detailed versions of these datasets, increasingly needed for research purposes, as well as to datasets much harder to anonymize such as the administrative databases. While progress is important at national level, though partial, transnational access to such highly detailed microdata from national and European public statistical sources remains highly problematic, while simultaneously the number of research projects demanding such data increases, in particular those dedicated to a cross-national comparison of public policies impact.
A CONTEXT DOMINATED BY OPEN DATA POLICIES AND PRIVACY PROTECTION ISSUES
The DwB project was primarily dedicated to promoting this wealth of official statistical datasets, both of national and European sources; in particular, it involved to foster secure access, at both national and transnational level, within the European Research Area (ERA), to the highly detailed, so-called “confidential” data. Two conflicting dimensions happened to frame the context: firstly, the open data scheme that advocates a wide opening to data viewed as essential to all citizens and economic partners in the knowledge society; secondly the legitimate concerns of citizens regarding the protection of personal information, that involve the reinforcement of legal arrangements framing such protection.
SHORT AND LONG TERM PROJECT OBJECTIVES
Such DwB goals involved short-term as well as long-term components. A first line of action dealt with short-term concrete issues, in order to make it easier for researchers to access the data across borders within the current context; a second one was to help removi the significant barriers of various kinds that bar access to data at both national and European level; this latter part involved discussions, negotiations between partner communities and, in the long run, the future implementation of infrastructures which the project was meant to design.
BASED ON STRONG COOPERATION BETWEEN THE EUROPEAN STATISTICAL SYSTEM, THE ARCHIVES AND THE RESEARCH COMMUNITIES
Whether we consider short-term or longer-term involvement, one of the characteristics of this project, which is also its strength, was to rely upon a network of various communities:
a) data producers, including national statistical institutes (NSIs) as major stakeholders and statistical departments within Ministries, coordinated by Eurostat within the European Statistical System (ESS) framework, the European Central Bank (ECB) and national central banks, and finally the different organizations that host administrative data;
b) data archives dedicated to research purposes partly organized under CESSDA’s umbrella, (the European network created on the 1970s as an informal association and in the process of becoming a European Research Infrastructure within the ESFRI scheme when the project is launched);
c) the research community including the researchers who are the final users and institutions (universities, ministries of research or research councils, dedicated funding agencies);
d) lastly, as important stakeholders, the authorities in charge of data protection within the current legal frameworks.
From the very beginning, this cooperation between various partners was explicitly meant to play a substantial role in the DwB project. The project relied upon a threefold partnership between data archives, mostly CESSDA Members (CNRS-RQ, UTA-FSD, UL-ADP, GESIS, NSD, UGOT-SND, RODA, EKKE, CIS, FORS, UEssex-UKDA, DANS, CED), national statistical institutes (CBS, SORS, Destatis, INSEE, ONS, SCB, CNPS-INS) and national statistical authorities (IAB, and GENES-CASD) as well as universities or specialized institutions involved in Statistical Disclosure Control (SDC) research or in metadata (URV, UPC, MT Ltd., ULL, CSIC, SOTON, UoMan). The involvement of researchers as users could only be set up on a one by one basis: the project provided financial support for research projects, selected through calls for proposals, demanding highly disaggregated data, hosted by project partners in different countries and accessible via secure systems. This constitutes a first circle of cooperation.
A second circle of cooperation was at the core of the project: the objective was to involve, through a program of events and activities, the whole communities with a similar focus: all national statistical institutes and Eurostat (which from the very beginning supported the project), all data archives and CESSDA as such, and as widely as possible, the academic community; the central idea was to broaden the dialogue, sharing knowledge of respective needs and requirements and to prepare the basis for a longer term and stable cooperation between the emerging infrastructure of Data archives affiliated with CESSDA and the European Statistical System coordinated by Eurostat. Therefore, in the European Commission terms, designed as a project of coordination of existing infrastructures, it could also be labelled as "Integrating Activities for Starting communities" with the ambition of eventually becoming "Integrating Activities for Advanced Communities ".
The ambition to build up such a network and enhance cooperation was not starting from scratch. It was based on three observations:
a) At national level, in the most recent years, significant progress could be observed in a number of countries: anonymized official microdata were increasingly made available to researchers; secure systems were created, allowing access to highly disaggregated data likely to allow indirect identification; all these revealed a growing awareness of the researchers’ needs and some changes in the OS producers behavioural pattern that made it reasonable to expect that barriers to transnational access would be removed.
b) in some countries, institutions in charge of official statistics and data archives had already set up highly efficient cooperation schemes to enhance the use of official statistics for research purposes; they would share in one way or another, workloads and costs induced by making data available to researchers, a type of cooperation which had also enhanced dialogue between data producers and users on data quality. These examples, however, were restricted to a few countries, considering the small number of Data Archives members of CESSDA having undertaken some type of partnership with their own national official statistics institutions when the DwB project was launched.
Regarding transnational access or at European level with Eurostat and the European Central Bank, no formal partnership was set up at European level.
This so far limited cooperation resulted in a highly fragmented landscape; in order to simply identify location and access procedures to official statistical sources, available in Europe, researchers were bound to perform a treasure hunt of uncertain outcome.
c ) Concerning access to highly detailed microdata, from official statistical sources, significant barriers existed that could only be removed through a close cooperation with data producers; disseminating data to researchers may not be their main mission but nevertheless, they play an essential part in the data production and protection process of highly sensitive individual data.
THREE COMPLEMENTARY BUILDING BLOCKS SERVING ONE ULTIMATE GOAL
Against this background, DwB was set up to ultimately link the capacity of the research community with the resources of the official microdata in Europe while enhancing researchers’ cross-border access to both anonymised and confidential microdata, supporting equal & easy access to official microdata, and contributing to consolidate the European data infrastructure needed for cutting-edge research and policies evaluations. Thus, DwB seeks to promote greater harmonisation of rules and practices, by building trust and fostering dialogue between the European Statistical System (ESS) led by Eurostat, CESSDA, the Research Community and other key stakeholders. European Data Access Forums, Regional workshops, Users’ Conferences gather these communities to discuss best practices & common standards for accreditation, security arrangements, anonymisation processes, metadata standards.
Three major areas of action were therefore designed for the project, constituting as many "building blocks" framing the different Work Packages (WP) each with both short-term improvement goals and service and longer term plans:
a) firstly the "front office" deals with issues related to tracking existing facilities, datasets, and access procedures in this highly fragmented field of official statistics within the ERA, and to the production of more relevant and user- friendly metadata for researchers, under harmonised documentation standards;
b) the core, data access stricto sensu, involving a number of issues: the legal framework constraints and their interpretation, which raises significant obstacles regarding transnational access to individual databases; an extreme heterogeneity of accreditation procedures; the feasibility of a European remote access network based on Research Data Centres (RDCs) allowing secure transnational access to such data; finally, the development of methods, guidelines and software in the field of Statistical Disclosure Control (SDC), in order firstly to ease the production of anonymized data under the "Public Use Files" format and secondly to harmonize the “checking out” procedures in the Research Data Centres (RDCs) that would possibly join a European network;
c) in parallel, the work required to engage and involve all stakeholders from the different communities in a long-term cooperation both at national and at European level. This cooperation will be essential to help foster research practices based on large individual official statistical databases all over the European space.
As explained above, the activities of the DwB project have been structured under three complementary building blocks, each handling a specific facet of the overarching goal of the project:
1) A "Front Office" handling issues relating to OS data discovery to avoid information silos and facilitate microdata usage
2) An "Access Block" striving to immediately support transnational access to confidential microdata and to set the core recommendations and proposals for shaping the future of transnational access to official microdata
3) A constant effort to build mutual trust, common understanding and knowledge transfer between the stakeholders communities to ultimately reach joint agreements
The main DwB results will therefore be presented following the same logical architecture.
I - FRONT OFFICE AND DISCOVERY ISSUES: IMMEDIATE TOOLS AND SERVICES, RECOMMENDATIONS AND LONG-TERM PROPOSALS
A) INTRODUCTORY ASSESSMENT: - Facing a Fragmented and Moving Landscape for Resource Discovery
At the time where the DwB project is designed, the situation may be characterized as highly fragmented, with a lack of information concerning available official microdata, access points and procedures, data documentation is mostly incomplete or inadequate to researcher’s needs and standards are different.
Concerning data production and dissemination, the European level interferes with diversely coordinated and centralized national levels and in some cases with regional and federal levels. The situation for the user is complexified due to the different files for a same dataset depending on their degree of anonymization: some are available on-line, others may be accessed solely through secure points or “enclaves” and differences between the variables available in each file may not be clearly documented.
On top of internal silos created by official statistics both national and European, heap up those created by weak links between the official statistics institutions and the data archives which since the 1970s began to collect and disseminate among researchers surveys of academic sources. Only a few countries had started collaboration with official statistics institutions. The difficulty to grasp each country’s specific organization, combined with the language barrier resulting from the absence of English translation, make it hard for non-resident researchers to identify available OS data.
Providing a precise picture of the field organisation and boundaries was important to design relevant solutions. An infinitely more complex picture than initially envisaged emerge from the work undertaken by WP3 to give an overview of the accreditation procedures, by WP5 to list and select data to be documented in CIMES, the metadata base DwB built to provide a first general overview of European official microdata, by WP7, 8 and 12 dealing with metadata, by WP6 for preparing an overview of the OS landscape as an introduction to the DwB Training courses on European official statistics. Paradoxically, there is no such thing as a definition of “official statistics”, since boundaries and organization may vary from one country to the other, resulting in a wide and complex landscape. The multiplicity of information points concerning individual official data may vary across countries, depending on the extent of centralization of each national statistical system, and on the presence or absence of some coordination between the NSI and an a Data archive; when existing. To some degree or another, all European member states experience this problem. Therefore, the researcher has to investigate what is available in terms of official data not only on the NSI website, but also in some cases (typically France and the Netherlands) on each specific website of the Statistical Department of each Ministry. In a federal country such as Germany, the information is not always standardized at the same level. In some cases (Spain, United Kingdom), regional autonomy may generate various situations for official statistics organisations, with some overlapping the users may find difficult to figure out. In most cases, banking, financial, fiscal and administrative data, as well as medico-administrative data, are not centrally managed by the NSI. Moreover, as evidenced by the Luxemburg Income Study (LIS) that gathers surveys on budget and family wealth, similar surveys may come from official statistics in some countries while they may be produced by university or a private body in another one.
Data archives, when existing, did not offer a solution for researchers looking for official microdata within the ERA. At the beginning of the DwB project, few data archives had set up any type of cooperation, concerning either metadata provision or datasets dissemination, with official statistics organizations. Whenever some type of cooperation was started, it did not cover the whole field. In spite of what things looked like, unification was not any better at the European level. On the side of the producers, Eurostat is not the only operator. The European Central Bank, some DG at the European Commission, plus EUROFOUND also commission official surveys. CESSDA, totally dependent upon its data archive members, reflected their situation and most official statistics data were missing from its catalogue.
Two major conclusions were salient:
1) Access to the wide perimeter of administrative data was a core issue to be taken into account ; thanks to secure access development, the demand for such data is increasing in the research community: in the countries mainly relying on registers for OS, this type of data has been in use for a number of years. It is now increasingly the case in the other European countries, however with more segmented operators with various status, less coordination, less documentation, and recent development of data linkages, which in turn might create new issues for metadata (see on this point WP8). Within this field, increasingly important for the researchers are the fiscal, banking, finance, administrative and health-related data, generally produced by specific operators.. Though these administrative data are, by nature, difficult to harmonised at European level, the academic community’s demand for such data is increasing, driven by the demand of public policies evaluation and related cross-national multidisciplinary projects.
2) The increasing use of highly-detailed/confidential data involves two issues: a) the need to produce a documentation adequate to such detailed level and b) the confusion that may result from the simultaneous existence of several different files corresponding to different aggregation levels for a single survey (Scientific Use Files, Public Use Files, Secure Use Files); these are frequently stored in different locations, without any specific information concerning the links and differences between files.
It is quite clear that the 4 years duration of the DwB project was too short to solve all these problems. The scope of official statistics is extensive; there is a large community of data producers and providers whose involvement should be gained in order to remove certain obstacles. In addition, all over the program duration, in a constantly changing context, on-going evolutions had to be taken into account by the project. Let us list five of them.
The first remarkable trend was the implementation, in many countries, of access portals to Official data, in the context of “open data” policies that are gaining importance over the period, at both national and European level. This movement in favour of open data, targeted at a wider audience than the research community stricto sensu, namely various economic actors, impacted primarily official statistics bodies, compelled to make visible a larger share of their data in a more formal manner: in many countries, this resulted in the creation of portals with the objective of creating a single national point of access to official data. However, it is mostly aggregated data and in some limited cases, highly anonymized microdata, fit for communication to the general public, that can be found for the moment on these sites, which remain so far uncoordinated at European level, since Eurostat is solely in charge of the coordination of European surveys.
A second major trend pertains to the dialogue concerning documentation standards, a topic which is of major interest for the users and the Data archives. One of the significant aims, defined at the time the DwB project was designed at the end of year 2009, was to foster dialogue between DDI, the data documentation standard that emerged in the early 2000s among data archives experts to document data of academic source in a manner adequate to researchers’ expectations and the SDMX standard, dominant among NSIs, which is mostly designed for the exchange and harmonization of macroeconomic data produced by public sources. When the DwB programme is launched in 2011, a dialogue has already started between DDI Alliance and official statistics, and will intensify, boosted by the United nations’ concern to modernize official data production and to enforce a better integrated data documentation and dissemination policy throughout the overall production process. Discussions about Generic Statistical Information Model (GSIM) and Generic Statistical Business Process Model (GSBPM) had been continuous continue all along the duration of the project, within the UNECE framework. The DDI / SDMX dialogue suggested possible linkage between the two standards, as one focused on microdata, and the other on macro data, therefore supporting the growing interest of national statistical institutes for DDI regarding microdata, while GSPBM and GSIM pave the way for longer term implementation of a more integrated and systematic data documentation process. For DwB, the development of this dialogue and discussion on modernizing the official data production system was obviously a positive factor; it also involved refocusing, since DwB was not the initiator of the process, but only one of the dialogue partners, in charge, among others, of expressing researchers’ needs. Though providing a promising framework for future, these developments must not be overestimated, at least in the short term. For large organizations such as National Statistics Institutes and other operators in this domain, modernizing official statistics production is a lengthy process. Discussions initiated in the WP12 framework with a few NSIs concerning a prototype research engine that would search official statistics website have shown that official statistics operators are far from being in the capacity of producing adequate metadata to be directly harvested from the CESSDA portal.
Along these four years, a number of developments also occurred in metadata standards, with continuous evolution of the DDI standard, the RDF standard and software breakthrough, including Colectica.
On the Data Archive scene, CESSDA that was launched 30 years ago on an informal basis, as a European network of national data archives, is in the process of becoming a European Research Infrastructure, as part of the ESFRI scheme, and thus involving the relevant Ministries of the European members states (13 at this date). At the time of the DwB project closure, CESSDA had become incorporated as CESSDA-AS under a Norwegian legal status, to be soon (currently scheduled for 2016) changed to a European one as an ERIC (European Research Infrastructure Consortium) since the ERIC status has been adapted by the European Parliament to allow countries associated with the European Union (as this is the case for Norway) to become the host country for a European infrastructure.
This process has been of importance for DwB, since discussions, focused on cooperation between CESSDA and the ESS (Eurostat and the NSIs), aimed at building a formal agreement which required CESSDA to become an ERIC.
B) RESOURCE DISCOVERY & FRONT OFFICE: DwB Main Results & Outputs
In this context, DwB has produced tools and services that provide several immediate although partial improvements, likely to be further developed, as well as recommendations concerning standards and guidelines that may be re-appropriated by the various operators in order to facilitate data supply. Finally, DwB has designed a draft for a permanent organization that will service the researchers for transnational access to official microdata based on a long-term cooperation between the different communities, as initiated under the DwB’s umbrella.
It is mainly WPs 5, 7, 8 and 12 and their deliverables that have contributed to the results, in some cases incorporating contributions from other WPs. These can be gathered under 5 main headlines.
FRONT OFFICE RESULT 1 - Immediate Progress towards a Single Point of Information: CIMES
A first key objective was to immediately help researchers to find information about the existing official microdata in the ERA. Currently, conducting comparative research requires an in-depth search through the websites of each national statistical agency to discover which data are available while understanding how each producer documents its own data. While most providers of official microdata host some information on data and access conditions online, detailed information is usually available only in the native language. For researchers, acquiring information on available data and accreditation procedures, for countries other thantheir country of residence, is even more complex and sometimes simply impossible. This is particularly true for the confidential microdata, as demonstrated by the difficulties the WP9 & 10 applicants faced to find accurate and precise information about data available for their research projects. Making these data easier to find, more comprehensible and more usable, requires two improvements to the descriptive metadata. Firstly a centralized resource needs to be made available and secondly data documentation needs to be structured, standardized and presented in English language.
To this end, WP5 has built the CIMES database (Centralising and Integrating Metadata from European Statistics) describing available official microdata in Europe. CIMES is a web-based application developed (primarily funded by institutional resources) by CNRS and GENES that allows multiple users to produce metadata simultaneously and to store it in a relational database. With DwB partners, it has gathered the pieces of information regarding official microdata scattered throughout Europe, either in NSIs or data archives, and has stored them following the standard of the Data Documentation Initative (DDI) to ensure that the produced metadata are standardized and thus can also be reused by others (specifically the DwB WP 12 portal). Within CIMES, data were documented in English using a hierarchically structured metadata schema composed of three levels. A “series” is a set of studies and represents a longitudinal and repeated cross-sectional data collection process (e.g. German Microcensus). It also illustrates the continuous data collection process carried out by the NSIs, where each wave of collection is a study. The « study » level then describes an individual instance of this study program, usually a year in which a « study » was carried out (e.g. German Microcensus 2007). The dataset level then describes different versions of this study which are issued by the data producer. At this stage, the variable level is not documented as the first purpose was to provide rapidly the researchers with a comprehensive overview of official data available for research purposes in Europe as well as the procedures and conditions for requesting access for each type of version (Public Use Files, Scientific Use Files and Secure Use Files) to these data and the links to the providers where the researchers would find the complete documentation. It is a first step to build a single point of access to metadata in Europe.
The CIMES system has been made available for public use as of March 2015 and can be accessed at http://cimes.casd.eu. Furthermore the metadata contained within were successfully harvested by the WP12 portal. At the time of the DwB project closure, the database covered data from 31 countries in Europe with 248 series, 1570 studies and 1821 datasets from NSIs as well as from other official sources (as statistical departments in ministries) since in a number of countries many key official data sources are not produced by NSIs. This is a significant achievement, considering available resources; however it is far from covering the overall large perimeter of official microdata over Europe. Some selection criteria were adopted in order to identify the most relevant studies: relevance to social sciences, cross national comparability, broad topical coverage.
While the CESSDA process to define how DwB outputs may be maintained and further developed will take time and will depend on priorities of CESSDA’s work plan, after completion of the DwB task, CNRS-RQ is continuing the work (other DwB partners will be invited to join on a volunteer basis) to refine the tool, adding additional functionalities to make CIMES more user-friendly when going through the list of countries and datasets and to include more data and countries in the base, in particular for those countries where documentation is missing or inadequate. An additional field was also added for the European integrated microdata, with a first step that only lists the Eurostat microdata and provides links to another tool called MISSY (Microdata Information System) that covers documentation of a set of European integrated microdata down to the variable level, see below). In a second step, other integrated microdata such as those provided by Eurofound, the DG ECFIN Business and consumer trend surveys and the Eurobarometers may be referenced as well. CIMES also now integrates at the country level the fact sheets WP3 has produced for each country regarding the national legal framework and accreditation procedures for accessing NSIs microdata..
FRONT OFFICE RESULT 2 - Facilitated Usage of Integrated European Official Microdata
The second major product resulting in immediate progress for the researchers results from developing structured metadata and user-friendly routines for integrated European official microdata. This included two different data sources: census microdata which were incorporated into the IECM system and microdata from Eurostat which were entered into the MISSY system. While the objective of CIMES was to provide a broad database, here the aim was to provide in-depth information for a far smaller amount of studies. This data documentation should likewise aid researchers in data exploration and allow them to learn about the topical, geographical and temporal coverage of a data source in order to gauge whether it is a useful basis for a specific research project. It should also provide a tool for data analysis by providing detailed information on the data collection process and sampling procedures and most importantly detailed metadata on the variable level.
*More Censuses data available and harmonized in the IECM database*
The IECM project is currently disseminating for the research community census microdata from 19 European countries. There are over 90 million person records in the current dataset (around 30 million households) corresponding to 55 different census samples, conducted from the 1960 to the 2010 census rounds (from France 1962 to Ireland 2011). In June 2015, IECM started disseminating 115 million records from 58 census microdata samples. With DwB, CED (IPUMS partner for the European censuses) has worked to upgrade and expand its services on census microdata. 14 samples were integrated into the IECM database during the DwB project and 4 more samples were disseminated in June 2015. Data releases take place once a year in June to maximize efficiency over continuous releases throughout the year since every release requires a complete update of the entire database. Before dissemination, each sample went through a process of data formatting, harmonization and data documentation. In particular, this harmonization work is extremely important for later comparative analysis. Over the duration of the DwB project, around 1,500 original variables were integrated into 200 harmonized variables. The IECM website provides extensive documentation on all of these variables, including documentation of universe, codes and comparability descriptions, to name but a few. The IECM can be accessed at http://www.iecm-project.org.
*Eurostat microdata documented under MISSY (Microdata Information System)*
While Eurostat microdata are all available at Eurostat, researchers frequently complained about the complex and incomplete documentation and the formats of the data. A key development in this respect has been the update to the MISSY Editor at GESIS for documenting the European microdata from Eurostat. The MISSY editor was developed by GESIS as part of an independent project. While not technically a part of the requirements of the work package, GESIS invested considerable amounts of resources to adapt the MISSY system in order to allow the documentation of European microdata within the MISSY system by DwB partners. CNRS had coordinated a data access request to Eurostat so that all partners would have access to the microdata. The contract required in-depth discussions with Eurostat and the respective NSIs providing data for the integrated European microdata since access was needed not directly for research purposes but for methodological work on documentation. The metadata imported from the system files was then complemented manually by information drawn from documentation provided by Eurostat and the respective NSIs.
The metadata schema used for the documentation of Eurostat microdata uses the same general structure as that used for MISSY but is more detailed and slightly more complex. It is also structured hierarchically with a series, study and dataset level. The series level describes a data collection that usually spans over time and over multiple countries (e.g. EU-SILC). This level includes a number of general metadata items describing the coverage and intent of the series. The study level then goes on to describe a specific instance of a series (e.g. EU-SILC 2009). The metadata scheme used for CIMES had been expanded to include a subsection on country specific information that details specifics of data collection and sampling in each country. The dataset level is then linked to the study level and describes different instances of a study; in the case of the EU-SILC for example this would include the cross sectional and the longitudinal dataset.
The MISSY system went online in January 2015 and can be fully accessed by the public at http://www.gesis.org/missy/eu/missy-home and currently includes the following integrated European microdata series: European Labour Force Survey, European Statistics on Income and Living Conditions, Structure of Earnings Survey, Community Innovation Statistics and Adult Education Survey (while the DoW only stated that EU-SILC and EU-LFS metadata should be prepared).
*Routines to facilitate data preparation and analysis*
Besides acquiring the respective microdata and documentation required for producing such a comprehensive database, another important step was to produce routines for use with common statistical packages (i.e. SPSS, SAS, Stata and R) that would aid Eurostat data users. Two different types of services were developed here: routines to assist users in data preparation; these “setup files” and routines that operationalize common social scientific concepts, scales or indicators or which assist researchers in restructuring data files, are labelled as “microdata tools”. These tools will often involve a more elaborate documentation or, in some cases, take the form of a technical report.
The highest priority was to produce setup files for all Eurostat microdata included in MISSY as it provides an invaluable service for researchers since these datasets are quite complex and require a large initial time investment previous to data analysis. A brief description of how Eurostat data are currently distributed is necessary to illustrate the benefits of such a service. Microdata for these datasets are distributed by Eurostat in .csv format. These files consist of a number of records separated by lines. Each record consists of fields or variables. Once the data file is opened, researchers need to consult the codebook in order to understand the meaning of variables and their labels. The codebook is provided in PDF format and it is left to the researcher to create routines which will label variables and values. The setup files produced as part of WP5 tasks handle this for researchers thus saving hours of effort, which can instead be spent on actual data analysis, and increasing the comparability of research outputs produced with these data. Additionally missing values and labels are harmonized thus easing comparability over time and between countries. In order to automate the process of generating these setup files for a wide range of statistical packages, all of which use different code, RODA developed a tool for the R programming language which can generate setup files from DDI based metadata. This package was uploaded to the CRAN repository and is thus freely available to the research community.
Additionally a number of tools and routines were developed to assist researchers working with Eurostat microdata. This includes for example a report on Income harmonization in the EU-SILC, a tool to calculate innovation concepts for the CIS or a tool which will calculate poverty thresholds for the longitudinal version of the EU-SILC. The setups and microdata tools provided within this task complement the metadata generated to make official statistics microdata far more accessible to researchers. Together these services become a valuable tool to explore the contents of the data. This guarantees that users obtain a better sense of the contents and quality of data and have an easier time in doing their actual data analysis.
FRONT OFFICE RESULT 3 - General Recommendations for Metadata Standards and Software
While immediate services and tools were produced and made available for the researchers, DwB also worked on a middle-term and long term perspective with general recommendations for metadata standards and software that would meet most needs and facilitate the whole process to produce improved metadata also facing challenges emerging with new data. A whole WP was dedicated to this work with an initial central purpose to create a ‘shared workbench’ that enlarges cooperation between National Statistics Institutions (NSIs) and Data Archives, in line with the current developing international discussions; it contributes to define a set of standards meeting most needs. The three objectives of this work package are to (1) create an interaction between data archives and NSIs relating their use of metadata standards, (2) create an interaction with standards groups for administrative and preservation metadata, and (3) to identify similar cross-disciplinary standards activities and collaboration with these as appropriate.
With this objective, WP7 produced a state of the art in metadata usage, based on a large overview of practices in the archives and the NSIs, complemented by a monographic presentation of national situations based on a short questionnaire addressed to CESSDA’s data archives and consisting of 3 sets of questions concerning their cooperation with their respective NSI, their use of DDI and of controlled vocabularies. The report presenting the current state-of-the-art in metadata usage in NSIs and data archives introduces basic concepts and important actors in the area of work, and provides an overview of metadata standards and frameworks, in particular SDMX and DDI, that are in use in these communities. It describes efforts of the metadata community to enhance interoperable usage of both leading standards at NSIs and DAs. Frameworks and standards for statistical modernisation (e.g. GSIM), as well as overlaps and gaps of DDI and SDMX, were reviewed and discussed. The final chapters highlight the importance of controlled vocabularies and coding schemes in metadata harmonization and the relevance of frameworks and reference models as modelling tools. The current use of particular standards, classifications and coding schemes describes the state of the art. Based on the discussion of current needs in these fields, the concluding remarks summarise major findings on future proofed standards that can support documentation, dissemination and search for official microdata for use in research. They underline the fact that SDMX and DDI are complementary rather than competing standards and highlight the importance of GSIM and GSBPM for future relations between NSIs and DAs. The overall report shows the rapid changes in the landscape since the moment the DoW of the DwB project has been written. The SDMX/DDI Dialogue did not exist then and has developed rapidly.
Therefore DwB’s objective was no longer to deliver a platform for continuous exchanges as described in the DoW, but to actively contribute to these discussions, identifying and creating best opportunities for networking and collaboration (notably having a representative being member of the SDMX/DDI Dialogue Task Team 3: Access to Microdata) while closely following the progress in this fast-moving field, and discussing future directions of metadata collaboration between NSIs and Archives. With this perspective D7.2 and D7.3 (submitted as an integrated deliverable to improve scientific clarity) 1) discuss DDI, SDMX and conceptual models (GSPM, GSIM) according to different purposes and introduces basic criteria to assess and select metadata standards; (2) describe Big data, administrative data, versioning and metadata situation in data processing as areas not sufficiently covered by present standards; (3) investigate future developments regarding DDI, SDMX, and Linked Data and RDF and focus on trends applying controlled vocabularies and coding schemes; (4) highlight the need for best practices in key areas utilizing metadata, controlled vocabularies and classifications, which is demonstrated along with a best practices use case; (5) discuss standards with future relevance in the context of a temporal perspective to advance the current situation into future improvements of topics ranging from Big Data issues and improved and standardized use of metadata and vocabularies. D7.4 describes specific issues involved in software development to widely used metadata standards. The content comprises (1) the introduction to the problem of the metadata gap occurring in software driven processing of data, while metadata are usually not automatically recorded or updated through the data lifecycle; (2) highlight this issue in relation to upcoming challenges of new data sources and concerns about the metadata management in new big data resources; (3) discuss finally initiatives that are going to provide technical solution to bridge the metadata gap with complete and accessible audit trails and the introduction of a ‘datum’ object in particular; (4) concludes the technical feasibility towards data production and storage with fully integrated and fine-grained metadata solutions surrounding them. In fact GSBPM, GSIM, DDI, and CSPA are the four cornerstone of technical metadata standardisation that addresses the problems of the metadata gap. Finally D7.6 discuss: (1) the extension of existing social science metadata, options for the interdisciplinary use of research results and metadata and standards practices in related disciplines and (2) the role of contextual metadata in scientific research and regulative standards for linking data and publications. Respectively the report informs in the first main sections on (1) new data sources linking surveys data with administrative data, health data or spatial data, related research interest and data infrastructures that provide access to this resources; (2) figure out practices in utilizing metadata and classifications for this data sources (3) discuss needs according to the investigated data sources. The second main section concerns (1) investigations on the ambiguous term “contextual metadata”; (2) initiatives, practices and regulations linking data and publications; (3) discussion and conclusions considering upcoming needs for contextual metadata and advanced forms of integrated data/metadata publications.
FRONT OFFICE RESULT 4 – Users’ Stories and Making OS Metadata More Discoverable on CESSDA Portal
While CESSDA was to be created as a European Infrastructure for data for the social sciences, one important problem was the current under-representation of OS data in the CESSDA portal; where only 3 out of 15 current providers systematically disseminate data collected by the national statistical authority. Not only information on what is available is highly fragmented but also silos between OS data and other type of data are created. Primarily, the objective was totally linked to the official microdata with the idea that they would remain mainly in the hands of the producers. Further discussions showed that the current difficulties for the CESSDA portal were broader: currently only a small part of the CESSDA members’ catalogues is visible on the CESSDA portal; a new portal is needed that will be more flexible to different situations, including OS. Regarding these OS metadata in particular, the technical solutions should leave open the question on national strategy regarding developing cooperation between the data archives and the NSIs and cover the two situations (i.e. NSIs having data documentation & dissemination role; or relying on systematic collaboration with DAs re. dissemination-related purposes).
With this objective in mind, WP8 had to explore possibilities and problems associated with harvesting OS metadata on the CESSDA portal, create a metadata model incorporating SDMX and DDI as well as any system-specific enrichment required to deliver extended portal functionality, inform documented workflows and data flows for the harvesting and management of OS metadata and define functional requirements for effective resource discovery.
A first workshop concluded that to make microdata discoverable, it would be necessary to include all metadata provided by NSIs, intermediary archives and potentially other sources. It was clear that, since SDMX is used to describe aggregate data and DDI is generally used for microdata, it was inadequate to provide models using a single standard. Therefore the object model for the portal includes both standards so that all the potentially available metadata can be included in the portal. The model is a logical view of the first-class objects needed to support search and discovery across metadata on official microdata from both NSIs & archives. Both aggregate and microdata sets are included in the model. The model was then refined and enriched, notably including the metadata models used for CIMES and MISSY. Further enrichments to the model include the controlled vocabularies that were agreed upon at the second workshop and a solution for uniquely identifying data sets and the related metadata. At this stage it was still a high-level model not fit for implementation. The implementation model required reviewing the user needs that will be included in the portal functional requirements.
In order to identify possible workflows and dataflows for harvesting metadata on official microdata from NSIs, Data Archives (DAs), and other sources, including DwB itself (ie CIMES and MISSY), it was clear that an understanding of how data are produced at NSI is required. Surveys of NSI metadata creation and DA metadata dissemination were conducted with additional case studies of the statistical data production workflows at 3 NSIs, Destatis & CNPS-INS & CBS. The Generic Statistical Information Model (GSIM) for NSIs and the General Statistical Business Process Model (GSBPM) were also considered. An architecture and infrastructure for acquiring metadata from NSIs, DAs & DwB is proposed, and how and what metadata NSIs should publish, the protocols that could be used by NSIs and DAs, and the management of multiple sources of metadata and identifying the resources within the resource discovery portal is covered in detail.
The final step towards implementation for laying the foundations of a prototype was to propose portal resource discovery functionality for a search/ browse portal interface. This was based on a set of user stories collected by WP8 via telephone interviews, which describes the functional requirements for the user interface of the portal to support resource discovery of Official Statistics.. These user stories combine more general descriptions of user scenarios regarding their search strategy, the need of quick overview, good documentation, comparability, information about procedures, user generated content) with input from interviews conducted with researchers in the Netherlands, Spain, Romania and Germany, and user input gathered during an international focus group meeting. An additional chapter is dedicated to the subject of Virtual Research Environments (VRE). Although this topic is beyond resource discovery, the results of the interviews gave rise to the need to pay further attention to this additional functionality which is also a component of the architecture proposed by WP4 for a European Remote Access Network (EuRAN). However, the report also underlines that PDF (& Microsoft Word) formatted documents are used by many NSIs to distribute information about studies and microdata sets, which is clearly an issue for search and discovery of the official microdata. Many NSis disseminated information on their websites with the metadata formatted in HTML, however it is not clear if it is also possible to access the metadata using an automated machine-to-machine process.
As previously mentioned, the objective was both to prepare a more efficient CESSDA portal and to design ways to adapt to the current imbalance regarding OS metadata thus avoiding silos between data from different domains. Strategies to remedy the situation are twofold: NSIs could either (1) adopt technology for data documentation and data dissemination from Data Archives and build sites that could be included in portals like the CESSDA portal; or (2) collaborate systematically with and use national data archives for dissemination-related purposes. The DwB work for fostering cooperation between the NSIs and DA at national level has been more aligned with this second strategy. Nevertheless WP12 has developed technical solutions that could cover different solutions and leave open the question of the national strategy. Besides, there is a need to rapidly bring more OS metadata in the CESSDA portal, which raises both short-term and long-term problems.
For the longer term, the DwB prototype for a CESSDA Resource Discovery Portal (RDP) aims to align with the solutions brought forward by the HLG-initiative for modernization of statistical production; and to follow up operational metadata models and other relevant products originating from that initiative. In the shorter term, for filling the system with relevant data even where there are doubts about providers' willingness to invest work on making data accessible for being harvested, databases like CIMES and MISSY, both developed by WP5 provide solutions that WP12 took into account. Actually WP12 work should be seen as building tools for a variety of problems in a complex and moving landscape with a substantial variety of data and technologies.
The team focused on 1) streamlining the harvesting framework with related tools to facilitate retrieval and ingestion of metadata in various formats from a variety of participating resource providers; 2) building a prototype Resource Discovery Services Interface to test both functional and non-functional problems; and 3) developing a roadmap for the future development of a complete integrated system beyond the proof-of-concept and prototypes developed so far within the DwB project.
D12.1 discusses the characteristics of a database system developed to store harvested metadata in different versions. Material collected through harvesting could be stored in relational databases, as XML or RDF-documents. For indexing and search problems the RDF version is regarded as the most powerful and technology for developing this has been implemented and procedures for automating processes has been installed. D12.2 mainly focuses on data quality enhancement and tools and procedures for that purpose. D12.3 documents the work and the ideas behind the harvesting technology that has been developed and test-implemented. Further, transformation of the harvested material into raw RDF for enhanced usefulness and orchestration and automation of technical processes are documented. D12.4 discusses resource discovery problems across a variety of distinguishing topics. The purpose of the portal is to provide mechanisms to both researchers and applications for searching and retrieving metadata for the catalogued datasets. Applications interact with the system through a web services REST API. The end users access the portal through browser based interfaces, which are themselves applications leveraging the underlying services. The deliverable discusses the problems of standardizing the metadata, as well as search, retrieval and presentation problems. The integration of the IECM catalogue on the censuses into the portal was also studied.
In the current state of play, the harvesting tool proposed as a prototype for a future new CESSDA portal (that is now developed within the CESSDA workplan) will, by large, rely on CIMES and MISSY and includes a function which enables the export of metadata in DDI3.2 format and can be read by the prototype. To test both functional and non-functional problems, a full-scale prototype portal interface has been developed. It integrates material read from CIMES to demonstrate a light version of a harmonizer. The prototype portal interface is available at http://dwb-dev.nsd.uib.no/
FRONT OFFICE RESULT 5 - Proposal for a European Service Centre for Official Statistics (ESCOS)
Actually though dissemination is included in the work the National Statistical Institutes, Eurostat and other producers of official microdata have to ensure their main objective is to produce statistics. Furthermore, regarding specific services for the researchers that range from detailed and user friendly documentation translated in English to support for contextual metadata or routines for harmonization needed for comparative research, they frequently lack the resources and time. While an enhanced portal such as the CESSDA portal that could cover data from all domains, thus avoiding silos, is helpful for the researchers, there is also a need for specific services in the domain of official microdata. Besides, as mentioned before, in the current state of play there are limited possibilities that the OS producers will establish repositories that could supply content for a CESSDA portal. Such specific services for the researchers in the domain of official microdata would therefore also be a source for a CESSDA portal, as demonstrated currently by CIMES, MISSY and IECM.
Since the very start of the project, WP5 has been developing a concept for such services and discussed which kinds of services it could provide on- and offline and how such a service center could be integrated into the CESSDA-ERIC. It has produced a report on a service center for official statistics which was later revised. These reports were the product of extensive discussions and feedback from partners within the DwB project as well as the CESSDA and ESS communities in line with the recommendations made by the EC reviewer.. It includes recommendations for additional services developed under DwB regarding access issues and cooperation.
The report developed a concept for a European Service Centre for Official Statistics which should function as a research infrastructure for European official statistics microdata. Such a center should ideally be established on the basis of the existing CESSDA network of European data archives and cooperate tightly with the European Statistical System (ESS) that covers all the national statistical institutes coordinated by Eurostat. The underlying idea is that services provided by the ESCOS should benefit the research community by providing it with services tailored to their needs, as well as the ESS by relieving them of tasks which are not part of their core responsibilities. The ESCOS primary objective should be to promote the scientific use of European official statistics microdata by providing services for researchers such as comprehensive metadata and data access infrastructures and by working towards harmonization of data access for scientific purposes throughout Europe.
The report outlined the goals and objectives to be reached by a European Service Center for Official Statistics as well as the underlying motivation and scope of the services to be offered. It also detailed the specific tasks that should be carried out by a European Service Center for Official Statistics. First and foremost the establishment of an online platform that provides researchers with structured metadata, adjoining documents and routines as well as a platform for feedback, discussion and user input. CIMES, MISSY and IECM are seen as first steps towards such a service. Further envisioned services include the provision of training courses on the use of official microdata, organization of scientific conferences for users of official microdata (as those held by DwB) and coordination of researcher accreditation and cross-border access, in line with the proposals discussed in the other DwB WPs regarding access issues. Furthermore such a service centre could also serve as a mediator between the European research community and the ESS and could lobby for harmonization of data access conditions throughout Europe, thus maintaining and developing the umbrella for cooperation DwB has built.. The report makes suggestions as to how such a service centre could be implemented, ideally as a subunit of the CESSDA-ERIC, and outlines stages for a stepwise implementation.
II - ACCESS BLOCK: IMMEDIATE TRANSNATIONAL ACCESS, TOOLS, SERVICES, ROADMAPS & LONG-TERM PROPOSALS
A) INTRODUCTORY STATEMENT: "There is potential for transnational access to official microdata, but very complex obstacles to overcome"
Access issues were at the core of DwB project and organized within a second building block. Here again, it is important to remind the context in which DwB was launched. The overall opportunities for researchers using individual official statistical datasets were increasing. In a few countries, Public Use Files were made accessible to the general public and could be downloaded from the NSI website, as this is the case for France. Scientific Use Files, or de-facto anonymized files, presenting only low risk for privacy, were made available in most EU countries. At the exception of a few countries where access could be provided by Data Archives within the framework of an agreement with their respective NSI, these files were distributed directly by the OS producers.
Above all, the last ten years have seen an increasing trend of modifications of legal provisions that frame both the protection of personal data in general and the confidentiality of the individual information collected under the statistical law and public records, with derogatory status granted to research: under certain conditions (providing they have received previous authorization), researchers may securely access some sensitive data, likely to allow individual respondents’ identification. Secure Research Data Centres have been set up, first on a local basis, compelling researchers to work on site, then at a distance, involving either remote execution (job submission) or remote access (in this case the researcher works from a secure remote desktop and may actually see the raw (de-identified) data while not being able to download them and having the outputs checked for confidentiality before publication.
At the European level, in particular concerning Eurostat data, progress was less spectacular; however, under the 2009 European regulation, provision was made for the production of more anonymized files for the researchers. Data access had also become free of charge, which is greatly appreciated.
Overall, when DwB started, there was a general trend of progress towards more access to official microdata for the researchers, including highly-detailed ones through secure systems. Such progress in favour of more accessible datasets was generally triggered by the research community or by public authorities commissioning research projects based on official data. In several cases, for instance in France and the UK, Data Archives played a significant role to support this move.
However, a number of complex barriers remains, in particular concerning transnational access to confidential data. While transnational access was slowly improving with more countries providing access to Scientific Use Files to non-resident researchers, access to the highly-detailed ones (Secure Use Files) remained much more difficult for non resident, generally involving the necessity for the researchers to travel on site.
The situation here varies widely from one country to the other, as evidenced by the state of the art drawn by WP10 during the preparatory phase of CESSDA’s PPP, as well as the state of the art performed by DwB’s WP3. The diversity of procedures, schemes and practice is striking; under close scrutiny, each national situation turns out to be less favourable than expected: limited number of available files in some countries, extremely lengthy accreditation procedures in others, heterogeneity of rules pertaining to anonymisation level, high diversity of secure centres schemes. Some highly-sensitive data remain mostly inaccessible or access is very restrictive at national level, typically for fiscal, finance data and medico-administrative data, therefore even more difficult for transnational access. At European level, the accreditation procedure to the Eurostat SUF was criticized as too lengthy and poorly adapted to researchers’ needs. Files were also viewed as too heavily anonymized and secure access, on site, remains insignificant. Eurostat data remained largely under-utilized, as compared to other datasets of academic sources, such as the European Social Survey.
Paradoxically, along the development of more open data policies, one consequence is a growing complexity of the access process. Depending on the type of data they need, researchers are facing a diversity of access procedures, practice and schemes, that pile up if they have to apply to different producers/providers, resulting in an extremely lengthy process.
The complexity obviously impacted transnational access to Secure Use Files for which more substantial obstacles remained when DwB started. In very few cases (e.g. France, Netherlands) remote transnational access may be granted for Secure Use Files, but in most cases, the non-resident researcher has to travel and work on site, or transnational access is simply impossible.
The situation is even more complex when a research team formed of researchers based in different countries would like to work together simultaneously on files stored in different countries and to merge data for running a single analysis. The transfer of secure files is generally deemed as impossible, for legal reasons; besides, secure facilities are poorly adapted to collective work.
Overall, when DwB project started, the global situation of barriers to accessing data may be captured as follows:
- A situation that remains uneven from one country to the other, namely concerning access to confidential data.
- Fragmented Access: different points of access, different facilities and software, different type of access and procedures depending of data required
- Substantial barriers, mainly understood as legal barriers, remaining for transnational access, particularly for remote access across borders and combining data
- Multiple accreditations required and no harmonisation
- Lack of harmonisation in terminology and practices regarding anonymisation
In order to assess DwB contribution, let us first give a picture of the general landscape for years later, at the closure of the project, in order to point out the developments of the context that may have occurred, some positive, some negative. DwB was indeed actively involved in some developments that we will just briefly mention here and that will be further detailed.
We should first notice a persistent development of legal frameworks favourable to official statistical data access, in particular concerning these data likely to make possible indirect identification of statistical units, and more sensitive data such as fiscal, financial and health-related administrative data. We also notice a growing openness of the legal framework to possibilities of data linkages, mostly concerning administrative data. On the European scene, the new regulation for researchers’ access to the Eurostat microdata that was considered as imminent when the project started was finally implemented four years later. DwB took an active part in the discussions regarding this new legal arrangement that is likely to improve Public Use Files’ production for the Eurostat microdata also usable as Campus files for teaching purposes, and in the long run, access to Secure use Files. Campus files that were produced for the DwB training sessions by Eurostat with the agreement of about 14 NSIs certainly helped in the discussions. However, some issues remain so far unresolved: the accreditation process still remains quite slow with a 2-step procedure, including when Scientific use Files are involved, as noticed by the panel session with the researchers at the 2nd European Data Access Forum organised by DwB in March 2015. Current SUF for these Eurostat microdata remain too aggregated for many research projects while there seems to be a long path forward at the closure of DwB for the implementation of a European secure network based on national nodes that would allow a distributed remote access to Secure Use Files, now permitted in the new European Regulation (which is greatly appreciated).
The "Open data" movement has provided a quite positive context of course for DwB during the four years of its existence. Promoted by official bodies at both European and national level, it puts a strong pressure on official data producers for more access. However detailed individual data are not its primary concern; its main focus is on economic actors and citizens, not on the research community as such. Moreover the open data movement, dealing with wider access to highly aggregated, anonymised data, may be misleading in some cases if it wrongly appears as solving the whole range of problems experienced by the research community for accessing highly detailed/sensitive microdata. The open data context, may also question in the future the frontier between a "researcher", as opposed to a regular "user".
Concerning access to secure transnational data, a major progress was made with the appearance of the notion of "circle of trust" to serve as a basis for agreement around security and responsibility between actors, producers and providers belonging to different nations. This notion appears as central for implementing a secure European network, one of DwB’s major focuses. It was simultaneously put into discussion within the OECD and within DwB by NSIs partners who were taking part in both DwB and OECD contexts. This notion of "circle of trust" can also be traced in discussions within the ESS Working Group on Statistical Confidentiality on the European regulation concerning access to secure data files for the European surveys coordinated by Eurostat.
Conversely during the same period, there was a growing and legitimate concern among citizens concerning privacy issues and the wider and more frequent use of all sorts of personal data of a confidential nature. At the European level, the current draft of a European Regulation for privacy protection designed to replace the 1995 EU Directive that had no enforcement power at the national level, may involve some less favourable provision for research that might impact current national legal provisions. It has been widely discussed within the various scientific communities (including CESSDA-AS).
In total, DwB has greatly benefited from the previous visible progressing trends that have kept improving over the 4 years of its existence. Official statistics operators have adopted a new attitude towards research: new solutions have been designed; there is a growing awareness of the necessity to remove existing barriers at a time when opposing forces may be stronger.. Altogether, secure data schemes have been developed during the period and some first elements of national (UK for administrative data) and transnational (Nordic countries) networks projects have emerged, along the project envisioned by DwB, thus strengthening the proposal of a secure European network designed by DwB.
B) ACCESS BLOCK: DwB Main Results & Outputs
The “Access Block” crucially required finding common grounds when it comes to removing obstacles to transnational access to confidential microdata. This entailed reaching agreements between the different stakeholders, balancing the researchers' needs with the necessary requirements for privacy protection ensured by the data holders and the data providers. The issues at stake included the legal frameworks and their interpretation, accreditation procedures and practices, information security (WP3); access issues for building a transnational secure remote access network (WP4) and methodological issues regarding Statistical Disclosure Control (WP11). Concretely, immediate access to confidential microdata had also to be provided by 7 Research Data Centres from 4 countries to researchers selected within the Transnational Access activities (WP9 & 10). Lessons learnt, not only from the obstacles encountered but also from the cooperation links and trust built between the involved Research Data Centres (RDCs) to overcome these challenges, were also expected to be important contributions to this block, and to the project as a whole; conversely it was also envisioned to integrate, when possible, progresses obtained through the project within these WPs for improving facilities offered to the researchers.
As mentioned in the previous building block, DwB has produced tools and services that involve some immediate although partial improvements, likely to be further developed, as well as recommendations concerning standards and guidelines that may be re-appropriated by the various operators in order to facilitate data supply. Finally, DwB has designed a draft for a permanent organization that requires investments beyond the project limits, as part of a long-term cooperation between different communities, gathered under DwB’s umbrella. It is mainly WPs 3, 4, 9, 10 and 11 that have contributed to the results, with a lighter contribution from other WPs. Work along these axes also provided material to the proposal for a European Service Centre for Official Statistics.
ACCESS BLOCK RESULT 1 - Immediate Support to Transnational Access to Official Confidential Microdata While Building Cooperation between RDCs
While the other WPs were working towards creating the conditions for improved transnational access to confidential microdata by the researchers within the European Research Area that currently faces a number of obstacles, DwB also aimed at providing immediate support to the researchers thus increasing the possibilities for cross-national research. All along the 4 years of the project, within the so-called Transnational Activities WPs, (WP9 and 10), 7 Research Data Centres partners of the project from 4 countries - Germany (GESIS, IAB and DESTATIS), France (CASD), UK (SDS), and Netherlands CBS – that could allow transnational access to their confidential microdata, provided support to the researchers for this access. Several of them were NSIs or NSAs, 2 others Data Archives (via agreement with their NSI), mostly providing access to official microdata from their respective country.
TNA partners not only worked toward increasing the research community's awareness, but also started alleviating the obstacles for transnational access and preparing better conditions in the future. An original feature of the project was therefore the decision to coordinate the RDCs support with the perspective of facilitating comparative research projects requiring access to microdata held in different RDCs/countries (even if not possible within the current context to combine them and run a single analysis). Instead of having each RDC working alone, DwB set up 2 WPs gathering the different RDCs partners located in different countries and requiring different procedures for access. One WP gathered those RDCs that require the researchers to travel on site (WP9), while the other (WP10) gathered those allowing remote access from a different country. Under WP9, the researchers had to move to the country where the data were hosted in order to get onsite access due to legal constraints; while WP10 operates for remote access to RDCs allowed to provide transnational access with this mode.
All RDCs gave specific support to researchers consisting of (but not limited to): assistance with national accreditation, translation (into English only; e.g. application forms, files, documentation), data information, documentation & identification, training in the use of the RDC's facilities for accessing microdata, technical support (provision of software tools, etc.), using the data (no advice on software usage or on the research will be provided), output checking. Financial support was provided for travel and daily expenses in case of onsite access (WP9) including in some cases travels necessary for accreditation and enrolment before remote access can be provided under WP10, as well as for fees if any (whether for WP9 or 10). In total, 8 calls were opened during the project with 40 high-quality projects selected from all over Europe by a Users Selection Panel (USP); further to an administrative check of the RDCs to ensure compliance of the projects with the data available. Some of the researchers have already published their research in academic journals.
In order to provide actual transnational access to selected users, all RDCs worked together for facilitating joint access to their installations; whereas all of them had to operate within their legal frameworks and current practices regarding transnational access. They had to understand each other's practices and national constraints (technical, legal, organizational) in order to be able to design a joint harmonized selection procedure aligned with national requirements, to agree on a common set of eligibility and selection criteria, to draft a joint application form that would allow checking users' projects seamlessly across countries, etc. Since the very beginning, this activity of coordination had turned out to be very challenging due to the situation regarding transnational access to official microdata; e.g. unharmonised procedures for access, lack of translation of the application forms, lack of translation of the application forms and of the metadata. This notably allowed concretely starting building relations and mutual trust between the involved RDCs, which would be the basis for a future European remote access network, which was a main long-term objective of DwB. Per se, this is one of the most important - although intangible - results of DwB.
With 23 users' projects that actually started their research, and over 200 onsite access days and 1000 remote access days provided to selected researchers from 11 European countries, this activity has clearly demonstrated the actual need of the research communities for transnational access to official microdata. However the number of projects was lower than expected. The reasons are manifold and the DwB TNA experience allowed drawing some useful lessons of relevance for the future of transnational access to OS Data. These are extensively exposed in a specific report (together with recommendations), but noticeable reasons are:
(1) necessary infrastructures or functionalities, some of which proposed as crucial components for a future EuRAN) (WP4), are not available yet and require important further investments;
(2) applicants' lack of knowledge of the actual content & types of datasets available or national access/accreditation conditions. In the meanwhile, TNA partners therefore intensified their effort to increase the research community's awareness;
(3) a lot of universities have their one budget for travelling, because working with the world-best data for the specific question to be answered is quite common, and most researchers are looking for research budget whereas TNA activities as set out under the FP7 and carried out by DwB could only offer reimbursement for travels, sojourns and fees when applicable;
(4) in many cases data are accessible free-of-charge, therefore not necessitating financial support
(5) the nature of cross-national comparisons has changed and mostly does not require travelling but an infrastructure allowing researchers from different countries working easily together and combining microdata.
Only setting up the European Remote Access Network DwB has proposed and that requires further investment will make transnational access fit the researchers' needs for collaborative work on data sources hosted in different RDCs; thus more appealing.
ACCESS BLOCK RESULT 2 - Background Information on Access to Confidential Microdata from Users and Providers (some of them resulting in searchable tools made publicly available to facilitate users search and best practices)
TNA WPs were a major source for understanding the obstacles and demonstrating concretely how progress towards more efficiency and harmonisation can be achieved. Some major findings can be highlighted: a common selection procedure works; dissemination channels are efficient and allow reaching most relevant partners (most of the important web pages in Europe had the call displayed in their news section); we now have a clearer view of the transnational users' needs and working methods; a strong collaboration network has been set up between the partner RDCs. This also resulted in achieving two core objectives: (1) high-quality research is done at a transnational level and (2) having a good sample of users for collecting feedback regarding further developments of the installations when it comes to working with microdata across borders. All in all, despite these difficulties, the TNA WPs have demonstrated that transnational access to official microdata for research purposes (notably for comparative studies) is actually made possible with close coordination and political will/involvement. Besides, they have been a major resource for the whole access block and more generally the whole project, helping understand the obstacles and demonstrating concretely how progress towards more efficiency and harmonisation can be achieved: the involved RDCs have co-ordinated and harmonised procedures and working documents (incl. application form, application & selection process, supporting researchers' accreditation in different countries) while providing useful case studies for other WPs (notably about the accreditation procedures).
The background information provided by the TNA WPs complemented a number of surveys and workshops organized by WP3 and WP4 to get a precise overview of the state of play in access to confidential microdata in the European Research Area both at national and European level; as well as the point of views of the RDCs and users regarding further improvements notably for transnational access. Particularly noticeable were: 1) An online Survey Monkey survey with the NSIs legal department regarding their interpretation and practice for transnational access to confidential microdata; 2) a survey on current researcher accreditation arrangements starting with information from NSIs websites and complemented by information obtained directly, notably from Eastern NSIs in a dedicated workshop (Bucharest, 2012); 3) a consultation (telephone interviews and specific workshop in Lausanne, 2014) of the NSIs or authorities in charge of accreditation exploring the opportunity to speed up, simplify and harmonize the accreditation procedures; 4) an online survey conducted in 5 RDCs partners with a sample of their users and later complemented with 2 workshops with selected researchers in CBS (Netherlands) and CASD (France) RDCs and several additional in-depth interviews; 5) a survey of the state-of-the-art in RDCs offering remote access in Europe.
Results from some of these surveys, which were crucial for the whole DwB work, were also used to develop searchable tools made publicly available to facilitate users' search and best practices. On that basis, an interactive visualisation tool was developed, which may serve as a guide for all parties interested in the potential of the legal frameworks for transnational access to data across the ERA. The information collected on current researchers' accreditation arrangements in the European research area was structured into webpages made available online, also further integrated in CIMES for each country. The database contains country-based factsheets providing basic information (conditions, modes of access, timing, costs etc.) and links to all the relevant websites, application forms, pro-forma contracts and other online documentation provided by national institutions. They facilitate data users' search for information on accreditation in a comparable manner across countries, also allowing the national authorities to look for best practices.
Both tools provide immediate services to the communities. They allow getting an overview of existing situations and may ultimately serve as a basis for knowledge sharing and capacity building. However, these tools also require maintenance, regular updates and further developments beyond DwB. Resources will need to be devoted to this work and a consultation mechanism must be created to allow NSIs to communicate any change that should be reported regarding national situations.
ACCESS BLOCK RESULT 3: Sharing a Common Terminology
The Access Block crucially required finding common grounds when it comes to removing obstacles to transnational access to confidential microdata. An important common output from the different Access Block WPs was that a shared glossary / taxonomy is essential to make progress in DwB and other supranational initiatives to improve access to microdata.
Early in the implementation phase of DwB, partners quickly discovered how difficult it was to understand each other in a field where the same term could be interpreted differently across countries and institutions or in cases where different expressions could refer to the same object. Such confusion in the limited context of DwB was also apparent in the wider and more institutional framework of discussions held within the ESS Working Group on Statistical Confidentiality (WGSC) and the OECD. This was particularly obvious on 4 items: the understanding of "confidential data", the distinction between the various types of data files made available to different communities including researchers, the existing types of secure data access and - finally - the very definition of a researcher.
The vagueness in the definition of "confidential data" - which could refer to either raw data including direct identifiers i.e. address, names; de-identified data; files including sensitive variables or de-facto anonymised data with a low disclosure risk (Scientific Use Files) - had a strong impact on the definition of the different types of data files, which designations also vary across countries. DwB suggested applying the term "confidential data" only to those data allowing direct or indirect identification of individuals and firms; thus also including de-identified data for which only direct identifiers were removed. Such data should be referred to as "Scientific Secure Use Files", which allow differentiating them from the "Scientific Use Files" (de-facto anonymised files) with a lower risk, and from the "Public Use Files" with much more aggregated variables. The latter can also be used as "Campus Use Files" for training purpose (since specifically-extracted files are only needed when there is no existing campus file). Adoption of this terminology, also discussed in parallel within the OECD and Eurostat context, facilitates mutual understanding though difficulties remain when it comes to the different practices regarding the level of anonymisation for the de-facto anonymized files and the Public Use Files.
In-depth discussions also allowed clarifying the terminology used when referring to secure access modes, since preliminary talks demonstrated the term "Remote Access" could cover secure systems allowing direct access to de-identified data that users could actually see, as well as to data actually similar to "Scientific Use Files" (de-facto anonymised), or even - in several cases - used for "job submission" or "remote execution" where the researchers do not access directly the data. The discussions held within the WGSC and the content of the new EU Regulation on Data Access directly echo back this clarification work on terminology that started within DwB.
Such clarifications provide a more solid basis for any negotiations on transnational agreements. Finding common grounds for defining "who is a researcher" and "what is research purpose" is as much crucial, for access to confidential data rests on the accreditation of researchers. DwB demonstrated (see in particular the EDAF 1 specific session on that matter) that there was a quite strong difference between a definition based on the strict academic affiliation and a definition based on criteria that could include studies and organizations with a wider scope as for instance charity companies, or international institutions. Countries differ in their criteria and the frontiers are very likely to change in the next few years, within the context of the “open data”, which will clearly require further consideration.
ACCESS BLOCK RESULT 4: Demonstrating the Potential of Legal Frameworks for Transnational Access and the Notion of “Circle of Trust”
Under WP3, based on an audit of legal frameworks enabling access to microdata through publicly available sources, further consultation of the NSIs on different scenario of access to microdata, and cross-cutting work with the OECD Expert Group for International Collaboration on Microdata, DwB has established that the national legal frameworks for research data access are generally more powerful than are commonly thought: differences between countries quite often result from differences in interpretation of the “silence of the law” when it comes to transnational access. On closer examination, the barrier is often the perceived inability to ensure security and prosecute for breach of confidentiality in a second country. An interactive tool (available on the DwB website) demonstrates and visualizes the differences in interpretation and practices based on the findings on the online survey with the national statistical authorities.
Based on this finding, and in parallel with the OECD discussions restricted to the NSs, the notion of “circle of trust” was outlined to refer to the need to create circumstances where different parties within and across borders can rely on each other. Mutual trust is needed for sharing microdata services, i.e. exchanging microdata or providing access to confidential microdata. When creating a “circle of trust”, each member joining the Circle should be accepted according to the same rules and conditions which are approved by all members. These would cover confidentiality rules and security requirements, but also competence and legal aspects. There would also be set preconditions for the institutions themselves or for technologies providing the access. The notion of “circle of trust” helps to define reliable rules for a mutual trustworthy cooperation between partners with different protection needs, due to different laws, data, or technical implementations. The concept of equivalence directly derives from the trust-building tasks initiated between partners. The idea behind it is that organisations that are working more or less in the same business, that have the same responsibilities, and share the same tasks should trust each other because they understand what the others do, how they do it, and why they do it; simply because the organisations, their requirements, their procedures and their legal frameworks (including penalties in case of breach of confidentiality) are equivalent. Focusing on Research Data Centres (RDC) that are made to give access to confidential research data, equivalence would help to easily evidence that the main tasks of such organisations – like secure data storage, disclosure control, access rules – are done in a similar, understandable and trustable way. Therefore, the concept of equivalence enables cooperation between organisations by building trust upon shared understanding and a basis for establishing mutual responsibilities. These results provide the framework for agreements to be established between RDCs across borders for further implementation of a European Remote Access Network.
ACCESS BLOCK RESULT 5: First Steps and Roadmap towards a Standard for Accreditation
A major obstacle for the researchers as demonstrated by the difficulties of the TNA WPs lies in the diversity of the accreditation procedures they face particularly when multiple accreditation are needed for a project requiring data sources from different countries. WP3 has obtained a global picture of accreditation procedures and practices across Europe and has been able to identify a number of commonalities across countries. Key results are encouraging as: 1) most European countries do provide research access to their microdata, also allowing foreign (European) researchers to access their data, often under the same (or very similar) conditions as national researchers, but usually only upon registration with a research institution within the country concerned; 2) certain key basic principles are held in common, for example checking the non-profit research purpose; requiring a written application and signature of a contract; and ensuring some form of institutional backing; 3) cross-country differences usually concern actual practices and details rather than general principles; 4) existing bottlenecks revolve around the extent to which institutions, not just individual researchers, need official accreditation. This is an obstacle especially for trans-national access as it is more difficult to ascertain the suitability of a foreign than a national institution.
DwB has also shown that, while these issues require high-level negotiation and discussion, other gaps are easier to fill, in particular: lack or incompleteness of online information about accessible data and conditions for access; uneven availability of English-language translations. Based on them, WP3 identified a set of best practices in accreditation: availability of complete English translations of NSI websites; common terminology/logo/site structure to locate relevant information more easily; clarity and completeness of information on both general criteria and any special conditions (for example, for trans-national access); clarity and completeness of information on how to apply (including costs and timing); standard application forms (rather than a more generic request of a written letter) with English translations, ideally downloadable from the web and allowing both online and email submission. This is a first short-term recommendation in the roadmap towards a standard accreditation for transnational access to confidential microdata DwB has proposed. It should also be crucial to maintain and update the database of accreditation procedures, as well as the viszualisation tool on the interpretation of legal frameworks as a driver for adoption of best practices and harmonisation by the producers.
The mid-term recommendation is to promote an adoption of a standardised application form for transnational access. Following the findings of WP3 consultations of the NSIs on accreditation practice, and discussions at the DwB workshop in Lausanne, DwB has recommended a standardised application form for transnational access to official statistical data as a realistic mid-term step towards a convergence of accreditation practice. There is indeed a general consensus that the accreditation must be based on the project for which the data is requested, rather than solely on the requesting institution or researcher. The NSIs also agreed on the five key components that are considered to be at the core of a standardised application form: information on the researcher; place/institution where the data will be accessed / processed; requested datasets; research project; and justification. These key components must be addressed in the standardised application form. A significant number of NSIs have indicated they would be happy to collaborate for this work that could also benefit from the work done for the Eurostat microdata.
The long-term recommendation is to adopt a standard European model for accreditation and access to official microdata and integrate a transnational accreditation system within a European Service Centre for Official Statistics (ESCOS) envisioned in WP5. Analysis of findings suggested that a standard European model for accreditation and access to official statistics is a realistic possibility. There is indeed a general consensus that the accreditation must be based on the project for which the data is requested, rather than solely on the requesting institution or researcher. The Eurostat procedure may provide some model though there are differences in views regarding its two-steps procedure requiring to first accredit the research institution and then the research project; a procedure also perceived by the researchers as inadequate to the way research is organised with teams in different institutions and the increasing researchers' mobility from one institution to another. There is also a limited appetite of the NSIs at the moment for delegation of accreditation. While implementation of the short- and mid-term accreditation recommendations would certainly improve the situation considerably for researchers with respect to transnational access to official microdata, they would leave still other difficulties concerning access. Researchers would still have to pass by each procedure with the relevant NSI or other OS producer from which they are requesting data, even if a common application form was used. In the end, without a more centralized system, researchers could still have to wait long periods to finally have the data they need for their comparative analyses. Deliverable 3.4 states what is needed in Europe is a centralized system where researchers can discover and then apply for data from different NSIs all in one place, following the same procedures and using the same forms, through the use of a European Service Centre. This is a long-term perspective to be built progressively. NSIs could “opt-in”, adopting common application forms and accreditation rules as well as contractual procedures for speeding up the review and authorization in case of a project requiring data access to several RDCs and therefore multiple accreditations.
ACCESS BLOCK RESULT 6 – A global Vision for a European Remote Access Network
Though enormous progress has occurred with the developments of remote access, each RDC has developed its own solution leading to a situation where the researchers are facing different technologies, different equipments with different requirements regarding the place of access. Though they can access confidential microdata hosted in a foreign RDC, they cannot work together from their place with researchers located in other institutions/countries, nor combine data held in different RDCs for running a single analysis. Based on surveys and in-group discussions with researchers and RDCs, in-depth discussions between DwB partners and external feedbacks on a first EuRAN proposal, DwB WP4 delivered a detailed description of the architecture for a European Remote Access Network (EuRAN) that would balance the researchers needs with the RDCs and OS producers' requirements regarding the security they are in charge of. While the pure EuRAN network is the backbone for improving data access, the extensions (additional services to support transnational research projects) are the parts that add usability for the researchers. The corresponding reports deal with legal, organisational & technical issues. The Microdata Computation Centre (MiCoCe) – a part of the EuRAN architecture essential to provide a solution for those comparative projects that require combining datasets across borders from different Research Data Centres - was further discussed in a specific workshop (Nuremberg, April 2014).
ACCESS BLOCK RESULT 7 - A First Concrete Step: Implementing a New Transnational Access Based on an RDC-in-RDC Approach between IAB (Germany) and UKDA (UK), To Be Later Extended to CASD (France) and CBS (Netherlands)
While proposing an infrastructure for future data access in Europe, DwB also started laying some concrete path towards such a EuRAN, both providing immediate improvement usable for the researchers and lessons for the organisational backbone such a network would require. DwB has thus set up a first transnational access to IAB confidential microdata from a foreign RDC, starting with UKDA to be further extended to CASD and CBS. Both IAB and UKDA require the users to access from accredited data centres while CASD and CBS would allow accessing the data directly from universities. In order to establish a connection from UKDA to IAB data servers, the technical and organisational aspects had to be taken into account and an agreement had to be discussed and signed. In-depth discussion between the 3 partners involved in this pilot approach were crucial for reaching common grounds - thus building mutual trust - on the necessary organisational arrangements, contracts, security requirements and specific workflows (question: “who has to do what”) in order to make transnational data access possible; and to ultimately serve as a template for future network extensions. In particular, the inter-institutional contract that was negotiated had to be broad and flexible enough to accommodate the situation of all potential partners in Europe. Though the work was largely underestimated (notably the organisational aspects), the final results provide important lessons for the future deployment of a EuRAN across Europe. The contract was signed between IAB and UKDA, which required long and tedious negotiations with the respective legal and IT departments of the two RDCs, to set up a transnational access to IAB microdata from UKDA. Checking that the new service would not harm the security was also crucial and test environment were set up after completion and signature of the contract. Though much later than initially planned, real access was provided in the last month of the project to research projects selected via the TNA WPs, who were waiting for this new opportunity that will be maintained beyond DwB. Lessons provided by the long negotiations for agreeing on this RDC-in-RDC approach are important for future developments and projects to build a real network and could be seen as a Proof of Concept of the “circle of trust”. Though CASD and CBS do not require such an RDC-in-RDC approach to give remote access to their microdata, they will serve as access point to IAB servers, thus resulting in a new level of organisational network and paving the way towards a EuRAN infrastructure.
ACCESS BLOCK RESULT 8 - A Proof-of-Concept for a True Network Approach
However, although they provide concrete immediate improvement for transnational access to confidential microdata for the RDCs requiring the users to access from an accredited RDC while establishing a first network of RDCs, the solutions work in parallel and do not offer the possibility for researchers to access the data located in different RDCs from a single point of access with the same equipment, nor do they allow working together with other researchers and eventually combine data held in different RDCs as proposed by the EuRAN. In that perspective, a second and complementary step was to set up in parallel a Proof-of-Concept for a true network approach. Based on the real deployment of an IT infrastructure, with 3 servers installed in 3 RDCs from 2 countries (DESTATIS, GESIS and GENES; respectively in Germany and France) and a central node, this demonstrated the feasibility and the potential of the network approach regarded as the most credible concept for a European Remote Access Network (EuRAN). The results of the PoC, were demonstrated during the 2nd European Data Access Forum. Taken together with the work done for establishing the connexion between IAB and UKDA, the work provides a solid basis for further developments that will ultimately lead a fully functional EuRAN that will widen and enhance data access across the European Research Area.
ACCESS BLOCK RESULT 9 – Guidelines for Security Standard for Research Access Data Centres
Adoption of security standards will facilitate a large deployment of a future EuRAN. D3.3 "Research data centres and ISO27001 – a guide" provides an analysis of the key features of ISO27001 and has translated desirable features of research data access facilities in the ESS and the ERA, for adoption in practice by future builders of research data centres. The key recommendations from this deliverable are as follows: 1) the adoption and implementation of information security risk management should follow a common model, developed through exchange of expertise and a transparent architecture; 2) the ‘circle of trust’ concept should be used to ensure that standards in information security in RDCs becomes neither a race to the bottom, nor gold-plating, but a common core infrastructure standard that is fit for its purpose; 3) the necessary expertise to build a shared information security standard, and the mechanisms for promulgating it, already exists in the ERA; 4) benchmarking and audit is essential to retain the confidence of data depositors: 5) shared information security standards and interoperability does not mean all RDCs are the same. The internationally recognized information security standard ISO27001 presents guidelines for implementing a satisfactory level of information security for remote access and research data centre solutions, and it is recommendable for microdata access entities in the ERA, mainly NSIs and DAs, to implement it. This will lead to harmonization and transparency with respect to information security, which will lead to a “Circle of Trust” across borders. Without trust between institutions, bilateral agreements for remote access to data across borders will be difficult to achieve. If data confidentiality is convincingly secured, trust will be built both nationally and internationally.
ACCESS BLOCK RESULT 10 - Improved Methodology and Software for Managing Risks of Access to Detailed OS Data, Facilitated Production of Anonymized Files, “Safe” Tabular Data, Output Checking and Linkage for Database Integration
Developing methods for achieving the best ratio of utility against disclosure risk is important both for easing the production of anonymised files (Public Use Files and Scientific Use Files) and for facilitating and harmonizing the process of checking before releasing outputs produced by the researchers when working on confidential microdata. It is also increasingly important for linkage issues for which the researchers’ demand is high. Though there are a number of SDC techniques and methodology proposed and highly appreciated by the scientific community that deals with SDC, to allow their actual use by e.g. NSIs and archives, a user-friendly implementation was needed. The software packages resulting from the WP11 work are quite important in that perspective. These include a software tool for new masking techniques, which can be used to produce protected microdata like public use files, scientific use files and campus files; the software for cell suppression and CTA can be used by institutes that need to produce “safe” tabular data.
Among other work and results, DwB WP11 also developed advanced record linkage methods for database integration and methods that “on the fly” take into account estimation of disclosure risk. Partners developed approaches to learn the parameters for distance-based record linkage. Experiments were done with some datasets. A set of alternative distances in the record linkage process were considered; in particular, weighted Euclidean distances, weighted order statistics, bilinear forms and Choquet integral based distances. A focus was the analysis of disclosure risk when several databases are protected using k-anonymity, and also when there are streaming data that have to be protected on the fly. On the first issue, some theoretical results have been obtained. The problem of reidentification in the context of information retrieval and querying databases (e.g. different queries on the same database) was also considered.
The Guidelines for Output Checking are directly targeted to RDCs: the outputs that researchers produce based on the data owned by RDCs must be checked for possible disclosure before they can be released from the secured environment of the RDCs. The document produced provides a series of guidelines on a general approach for minimizing disclosure risk of research outputs based on official microdata, which should ultimately reduce the burden of output checking on RDCs by allowing implementing an automatic disclosure control. It notably proposes a general classification of outputs according to their disclosure risk and type of statistics / analysis conducted to produce them; before discussing a "rule-of-thumb" model based on a set of clear and simple rules that should be applied to each class of output to minimize the disclosure risk as much as possible. Finally, it discusses organizational, legal and procedural aspects and requirements in order to implement the most efficient output checking process possible.
Overall, under this access block, DwB has both contributed to immediate progress, supporting transnational access to 7 RDCs in 4 countries, opening a remote access to IAB (Germany) from SDS/UKDA (UK) that will avoid the researchers to travel to access IAB confidential microdata, while paving concretely the path towards future developments of a European access infrastructure to confidential microdata that is needed. The work done regarding the analysis of the legal frameworks, recommendations for security information, guidelines for checking outputs provides a clearer basis for enhancing transnational access to confidential microdata while the collaborative experience between 7 RDCs from 4 countries, the lessons gained from the Memorandum of Understanding signed by UKDA and IAB for setting up a transnational connection, and the Proof-of Concept for the EuRAN envisioned by DwB are strong incentives for further developments.
Such an infrastructure will have to take into account the developing context DwB has identified with the need to bridge with other domains where transnational access to confidential data raises similar issues, such as in the health sector, the increasing importance of administrative data - that may be located in quite diverse institutions - and of big data, many of them raising confidentiality issues.
On the other hand, though the trend towards more open data and more access for research purpose to confidential microdata is still important, the growing number of confidential microdata available also increases legitimate anxiety regarding privacy protection. Attention should be paid to the Future European Regulation on Privacy Protection that currently includes some provisions for which the research community has expressed concern.
III - BUILDING AND ENLARGING COOPERATION
A) INTRODUCTORY STATEMENT: A Rather Limited Cooperation both at National and European Level between OS producers, Archives and The Research Community
From the very beginning, cooperation between the different communities interested in enhancing access to official microdata was explicitly meant to play a substantial role in the DwB project. Significant barriers for accessing the official microdata, particularly when it comes to transnational access to confidential microdata, can only be removed through a close cooperation with data producers. OS data producers, national statistical institutes in particular, play an essential part in the data production and protection process of highly-sensitive individual data. However, though data dissemination is now clearly part of their tasks, their core activity is to produce statistics. As the demand for official microdata is increasing, more resources are needed, notably when it comes to secure systems for confidential microdata. Data archives on the other hand have been set up with the main purpose to archive, document and disseminate data for the researchers. They are particularly well equipped to structure metadata for use by the researchers, to accredit and support the users. In Europe, they have built for 30 years a collaborative network that is in the process of being transformed into a European Research Infrastructure Consortium (ERIC) with a legal status involving the ministries of the different partner countries. However, the CESSDA catalogue at the time DwB project started did not include many official microdata. This reflected the situation of a large part of its member archives at national level that were historically initially more focused on collecting data produced within the research community, particularly in the socio-political field. Only a few of them have built cooperation with the OS producers. As a result, the European data infrastructure for social sciences that the researchers need for discovery and access does not cover a very large share of microdata that are essential for many disciplines and research projects. One consequence of this situation is that the rich resources of official microdata, both national and European, remain underused, which also entails a limited input from the research community to the quality of OS data, particularly at European level.
The situation was even worse when it comes to transnational access to official national microdata, notably the highly-detailed ones requiring secure access systems. Transnational access to official microdata faces a number of problems ranging from discovery, harmonisation of procedures, legal frameworks, to security issues. Cooperation and trust between the different stakeholders are critical for removing obstacles and building the European remote access network that would allow the researchers to work collaboratively with data hosted in different places across borders.
No European platform existed to discuss how to alleviate these obstacles, or to build harmonized solutions and trust that will enhance the use of official microdata in the ERA. For the integrated European microdata (Eurostat microdata, ECB surveys), several researchers are members of the ESAC representing the users at European level; however, there was no direct cooperation between the European Statistical System including all NSIs and NSAs coordinated by Eurostat, CESSDA and the research community. For transnational access to the national OS resources not integrated at European level, the European Statistical System does not offer a framework as it deals only with the European integrated microdata.
Therefore a crucial feature and ambition of the DwB project was to build up such cooperation and European platform. This was not starting from scratch. In some countries, institutions in charge of official statistics and data archives had already set up highly efficient cooperation schemes to enhance the use of official statistics for research purposes; they would share in one way or another, workloads and costs induced by making data available to researchers, a type of cooperation which had also enhanced dialogue between data producers and users on data quality. DwB entailed two dimensions to build such cooperation at European level. The project itself was built on a direct partnership between institutions belonging to the different communities. This constitutes a first circle of cooperation. The second circle involves the whole communities. It entailed a set of events and activities with the central idea to broaden the dialogue, sharing knowledge of respective needs and requirements and to prepare the basis for a long-term and more stable cooperation between the emerging infrastructure of Data archives affiliated with CESSDA and the European Statistical System coordinated by Eurostat.
B) BUILDING AND ENLARGING COOPERATION: Results & Outputs
The work undertaken in this 3rd building block for building and enlarging cooperation had three main dimensions. The first one was related to the internal cooperation on which DwB project has been built, which was crucial for proposing solutions and reaching agreements. The second one, through a set of events and activities, targeted the whole communities: the researchers who are the final users, the data producers, the data providers whether archives, universities or producers and the other stakeholders as the research councils, ministries and authorities in charge of privacy protection and security. The third one, via intensive dissemination activities aimed at conveying more generally the main messages of the project to a broader circle. Though WP6 for the events and activities targeting specific stakeholders and WP2 for dissemination activities were strongly involved in these tasks, all WPs contributed to the overall objective, strengthening the internal cooperation between the direct partners and contributing to enlarging cooperation to others.
As part of the activities and events organised under WP6, the training sessions on European official microdata and the Users’ conferences targeted the researchers, also aiming at reinforcing the dialogue and cooperation between the users and the producers. The Staff Visits targeted the providers of secure access, the so-called Research Data Centres (RDCs) - whether located in NSIs, NSAs, archives or universities - with the objective of building transnational cooperation between them for further development of a European Remote Access Network. The Regional workshops aimed particularly at fostering cooperation between the archives and their respective NSI and other OS producers at national level, a necessary basis for developing cooperation at European level; while the European Data Access Forum and its additional workshops aimed at building a European platform for discussions on access to official microdata within the ERA by involving at European level all national and European stakeholders, in particular CESSDA and Eurostat. An institutional map was developed and maintained all along the project in order to better target the institutions and persons to be involved in the discussion and invited to the different events. It was been continuously enriched by the different WPs and reflects well the large perimeter of actors involved in one way or another in DwB activities and events.
All the activities proved to be highly-successful and praised by their respective target audience. On some instances, they even allowed contributing to noticeable improvements in the institutional landscape, as it will be further elaborated and discussed in the "Impact" section of this report.
ENLARGING COOPERATION - RESULT 1: Involving the Research Community and Fostering Dialogue between Users and OS Producers
Involving the research community in the project was crucial as the researchers are the final users. Although the archives belong to the research community, they do not use the data. Universities and research institutions directly involved as partners in DwB project were not the mean to involve the researchers as users into the project since they were participating to the project as either archives or research departments specialized in methodological research on SDC. It was therefore an important objective of DwB to involve more directly the researchers as users. This was done in different ways. Specific activities and events were dedicated to the research community: the Training Sessions on the European official microdata aimed at raising the awareness of researchers about the rich resources of OS in the ERA while the Users’ conference had the objective to foster the dialogue between the users and the producers. The TNA WPs that provided support to researchers for transnational access to confidential microdata were also used to get feedback from them for the overall project. In addition, all along the project, DwB aimed at involving the research community into the discussions, via online surveys or interviews, specific workshops and participation to the European Data Access Forums.
* Making Researchers More Aware of the Potential of OS Data: Training Sessions on the European Official Microdata *
Currently official microdata are underused in Europe. Making researchers more aware of these rich resources and how to access them was a key issue. A series of six training courses was organised to inform them about available data sources while providing hands-on training in the use of these microdata, particularly the integrated European ones (the Eurostat microdata) in order to foster their use.
The first training course was hosted in Mannheim in July 2012 and focused on the EU-LFS. The second training course in Bucharest on EU-SILC was held in February 2015. The next course was the second on EU-SILC and was hosted in February 2014 in Paris and focused on the longitudinal component. The fourth training course was hosted in Ljubljana and focused on the LFS. Due to the tight cooperation between the Slovenian data service (ADP) and NSI (SORS) this course provided some additional insights into the national data production for the EU-LFS. The fifth training course was held in Barcelona in January 2015 and focused on the use of the IECM database that harmonized census microdata from Europe. The final course was hosted in February 2015 in Athens and focused on AES.
All courses used the same structure, divided into three main parts over three consecutive days: an introduction to the Data without Boundaries project; an overview of national and European official microdata and of accreditation procedures with a focus on Secure Use Files; an introduction to the dataset targeted during the course; and a practical computer training session that was carried out with special training datasets on the most commonly-used data analysis software tools (SPSS and/or STATA).
The objective of the practical training sessions was to inform participants about the basic structure of the dataset in question, to provide practical advice for data handling and preparation and to explore key variables and concepts. The targeted datasets were the Eurostat microdata and the integrated censuses provided by IECM.
For the Eurostat microdata, the training datasets for the practical sessions were drawn as subsamples of the full microdata and the definition and production of these datasets was handled by Eurostat and GESIS. In a first step a request for data use for training courses was forwarded to all NSIs by Eurostat, the training dataset for each training course then included subsamples of the most recent survey wave for all countries who agreed to have their data used. It is noticeable that though no campus files existed for these microdata, Eurostat and about 14 NSIs agreed to provide such training files for the purpose of the DwB training sessions. It is also noticeable that Eurostat participated to 5 training sessions for introducing the survey structure. As the IECM is public use, the training courses could use the full database.
In total, the training activity on European integrated microdata and censuses attracted 142 trainees from all European countries, many of them PHD or post doc, and, for all courses, demand increased over time and actually exceeded capacity. This clearly demonstrates the researchers' needs and interest regarding the potential of such data for research purpose as well as their regulatory and procedural frameworks. This also shows that Campus Files are increasingly needed to match such demand, while emphasizing the necessity to follow up on this activity, hopefully within the CESSDA workplan and its training centre.
* Fostering Dialogue between Researchers and OS Data Producers: the Users’ Conferences *
Providing an opportunity to showcase research findings obtained from the use of official microdata to the producers is one of the more efficient ways to promote dialogue between researchers and producers. It makes the latter more aware of the interest to open more access and help build trust. Feedback from the users is also an important way to increase the quality of the data produced. DwB supported 2 of these Users' Conferences focusing on the European integrated microdata and offering a platform for researchers to present results based on these data and discuss with Eurostat researchers' needs in terms of available data, quality and procedures. Both were organised and hosted by GESIS, in Mannheim.
The first conference was held in March 2013. Researchers who use microdata were invited to submit an abstract through a Call for Papers. It gathered researchers from various social science disciplines who use microdata. The focus was on the use of the European Labour Force Survey (EU-LFS) and European Union Statistics on Income & Living Conditions (EU-SILC). Out of the 56 papers submitted, 32 papers were selected for their outstanding scientific quality. The papers were organised into nine topical sessions: Effects of Social Policies, Poverty and Deprivation, Standard and Non-Standard Employment, Quality of Life, At Risk of Poverty, Aspects of the Labour Market, Youth Employment, Gender and Work, Methodological Issues
In addition to the selected papers, two keynote speakers were invited. Emilio di Meglio from Eurostat presented ways to estimate the variance of SILC-based indicators (joint work with Guillaume Osier).
He emphasized the importance of achieving reliable variance estimates because they are the basis for determining whether changes in policy relevant indicators between years are significant. In the second keynote speech, Michael Gebel asked whether deregulation of employment protection, as witnessed in many European countries over the last two decades, has helped to combat youth unemployment.
In the closing session two presentations introduced the participants to other data sources. The first one acquainted the audience with the surveys conducted regularly by the European Foundation for the Improvement of Living and Working Conditions (Eurofound), namely, the European Quality of Life Survey (EQLS), the European Working Conditions Survey (EWCS) and the European Company Survey (ECS). During the second presentation, Eurostat gave an overview of microdata sources other than EU-LFS and EU-SILC which Eurostat offers or will offer in the near future for research purposes. It was also explained that the regulation on which microdata dissemination was based had been revised; and should result in implementing new procedures for processing data requests from users by Eurostat upon its publication.
In total, 82 participants from 22 countries attended the conference.
The second Users' Conference organised within DwB took place on 05 - 06 March 2015. Whereas the first iteration focused on EU-SILC and EU-LFS data, it was decided that the second one should broaden this scope and allow for the presentation of research work based on most EU-integrated surveys (SILC, LFS, AES, SES, CIS, EHIS and HBS). Topics addressed included the impact of the crisis on employment and unemployment, quality of work, innovativeness of enterprises, labour migration, poverty and social exclusion, income inequality, household expenditure, adult education, population health. In addition to substantive issues, presentations focusing on methodological topics were also welcomed.
The 30 selected papers (out of 64 submitted) were shared out into nine topical sessions, namely: Migration, Employment & Family, Education, Policies, Labour Market, 2 sessions Income, Youth Employment and Methods. In addition, four keynote speakers were invited to present the following: the work on flow statistics in the LFS, which represents first steps towards a longitudinal design in the LFS; research on indicators used to measure material deprivation and the cross-country validity of such indicators; research on the labour market integration of migrants on the basis of EU-SILC and ESS; and an overview of research proposals submitted to Eurostat and information on Eurostat’s future plans for the dissemination of microdata.
In total 92 persons from 24 different countries attended the conference.
Attendance, quality of the research papers as well as the high involvement of Eurostat clearly demonstrate the interest of both communities for this type of conference that Is expected to continue after DwB.
* Involving Researchers into Discussions of DwB Results *
Besides these regular activities and events, all along the project, DwB has continuously worked at involving the researchers into the project through online surveys, telephone interviews and dedicated workshops about metadata needs (WP8 users’ stories), experience of remote access and needs regarding transnational access (WP4). In particular, 2 workshops with selected researchers of 2 RDCs (CASD, France and CBS, Netherlands) were organized to get researchers' feedback on the EuRAN proposal. In the introduction of the 1st European Data Access Forum attended by all NSIs and Archives, a senior researcher (J. Grenet) described the experience of an international research project requiring the use of highly-detailed microdata across borders. Researchers supported by the TNA work packages were offered the possibility to give their feedback about their experience on transnational access to national microdata in an introductory panel session at the second European Data Access Forum, together with other researchers who focused on their experience of the use of Eurostat microdata. Many other researchers were invited at these 2 EDAF to participate into the overall discussion.
ENLARGING COOPERATION - RESULT 2: Building Cooperation between Research Data Centres (RDCs) Providing Secure Access to Confidential Microdata
A second important target was the Research Data Centres that provide access to confidential microdata, many of them operating within the NSIs or NSAs, others in archives of universities. While their number had increased in the years preceding the start of DwB, they mainly worked in isolation. Building cooperation between these RDCs was at the core of the project in the perspective of a future European Remote Access Network that would facilitate transnational access to OS and allow the researchers to work collaboratively with data sources across borders. Building on an initiative within the OECD context to set up an annual workshop of RDCs, DwB undertook various actions to foster discussions between the RDCs and build future cooperation.
* Direct Cooperation between Project Partner RDCs *
Several of these RDCs were direct project partners. In addition to their overall contribution to the project in the various WPs, 7 of them from 4 countries (France, Germany, UK and the Netherlands) directly cooperated within the TNA WPs to better support transnational access of the researchers selected through the different calls. As explained previously, all RDCs worked together for facilitating joint access to their installations. They had to understand each other's practices and national constraints (technical, legal, organizational) and agree on a common set of eligibility and selection criteria, to draft a joint application form that would allow checking users' projects seamlessly across countries, etc. This notably allowed concretely starting building relations and mutual trust between the involved RDCs. Per se, this is one of the most important - although intangible - results of DwB.
A second direct cooperation resulted from the Pilot and the Proof of Concept for a European Remote Access Network built within WP4. This involved into actual cooperation and capacity-building work the RDCs from UK (SDS/UKDA) and Germany (IAB) for the Pilot on the one hand, and the RDCs from France (CASD) and Germany (GESIS and DESTATIS) for the Proof-of-Concept on the other hand. These will for sure be the basis for further developments in the future.
*Workshops with RDCs *
Several dedicated workshops targeted all the RDCs within the ERA also involving several non-European RDCs. This was notably the case with a workshop of RDCs co-organised by DwB and IAB as a satellite event of the 1st European Data Access Forum to follow up on the initiative started within the OECD and foster exchanges of experience and discussions between the RDCs. Additional workshops, also organized by WP6 in cooperation with WP4, gathered many RDCs from numerous EU countries. The DwB proposal for a EuRAN was presented and discussed at a RDCs workshop organised as a satellite event of the 2013 NTTS conference. The MiCoCE workshop in Nuremberg also gathered many RDCs to discuss specific issues regarding possible cooperation between RDCs that would allow the researchers to combine data from different sources across borders.
* Staff Visits *
Finally, under WP6, specific activities were dedicated to promote the use of remote access to confidential microdata and to foster the development of exchanges on practices between the RDCs, also inducing knowledge transfer. DwB organized staff visits aimed at contributing to the achievement of a European distributed remote access network; spreading the acquired experience of European RDCs providing remote access to confidential microdata; inducing knowledge transfer. 8 RDCs from 7 European countries volunteered to host these visits, of which one was not involved in DwB (the HMRC). These were: Centre d’accès sécurisé aux données (CASD, FR), Institut für Arbeitsmarkt und Berufsforschung (IAB, DE), Centraal Bureau voor de Statistiek (CBS, NL), Statistical Office of the Republic of Slovenia (SORS, SI), Statistiska centralbyran (SCB, SE), HM Revenue & Customs (HMRC, UK), Office for national statistics (ONS, UK) and UK Data Archive (UKDA)
The RDCs involved worked together to build a common content framework for these visits that would help visitors comprehend the context in which the access solution was designed and implemented, the constraints to be taken into account, the challenges to be overcome, etc. Such aspects included the following: historical background of the RA implementation; national legal framework; strategic, technical and organisational directions taken; output checking and documentation processes; financial aspects such as investment and maintenance costs, fees for researchers, demos of the running solutions.
All in all, the 8 staff visits (actually more than initially envisioned) attracted 53 participants from 25 different institutions, of which 19 non-DwB members, from 20 European counties; and have spread beyond the DwB perimeter: Austria, Belgium, Bosnia, Croatia, Czech Republic, Estonia, Finland, France, Germany, Hungary, Ireland, Latvia, Lithuania, The Netherlands, Norway, Portugal, Slovenia, Sweden, Turkey and the United Kingdom. Visiting staff were from RDCs from different type of institutions: NSIs, archives and universities.
Overall, the staff visits got a very positive feedback when it comes to their content and the way they were organised. Amongst appreciated features, visitors praised the demos, exchange of experiences in small international groups (Q&A), getting concise and complete contextual & technical information, etc. As to possible improvements, it was commonly suggested to dedicate extra time to present and discuss researchers’ point of view and experience as users of the implemented access solutions.
Evaluation (via a final survey) showed that this activity increased knowledge and awareness of possible Remote Access solutions; developed cooperation; convinced some countries while providing inspiration to others. Besides, the staff visits allowed disconfirming or confirming internal strategic choices; gave directions to improve existing solutions and helped build networks between RDCs that will be useful for future developments beyond DwB.
Noteworthy enough, the staff visits also appeared to be very helpful and satisfying for host-RDCs, who praised the same items as the visitors; a very interesting outcome that was not expected at the designing stage of the task. In any case, participants have stressed the importance of pursuing such initiatives.
ENLARGING COOPERATION - RESULT 3: Building Cooperation between Archives and NSIs at National and European Level
An essential goal of the DwB was to promote cooperation and foster relationships between the European Statistical System (including National Statistical Institutes and related agencies in member states) and CESSDA (Consortium of European Social Science Data Archives) in order to build an efficient data infrastructure for social sciences in the ERA. To achieve this, DwB deemed essential to work in parallel at different levels, supporting cooperation between National Statistical Institutes and data archives at national level as a necessary step to build cooperation at European level between CESSDA and the European Statistical System led by Eurostat. Here also, actions involved the direct project partners from the different communities and dedicated activities meant to enlarge this cooperation.
*Internal Cooperation between Archives and NSIs within DwB *
Though essentially included within the project through the involvement of partners from the archives and from the NSIs or NSAs, cooperation between those different communities had to be concretely built. Except for a few cases (particularly France and UK) where a solid cooperation pre-existed to the project between the archive and their respective NSI, no or little cooperation was set up in the other countries. Moreover, while the members of each community had regular contacts and cooperation within their own community (CESSDA and the ESS), many of them never had had an occasion to meet and discuss with the project partners from the other community. Therefore, each WP purposely involved partners from each community so that they would work together all along the project to share a common terminology, understand their needs, and reach agreements to build tools and proposals. 4 years have thus allowed building a solid new community. Indirectly it has contributed to foster cooperation out of DwB at national level between some project partners from the different communities.
* Regional workshops *
Regional workshops were meant to enlarge this circle by building more cooperation between the two communities at national in all the EU and associated countries. European countries differ widely in this respect: while cooperation is strong and well-established in some, it is looser or sometimes embryonic in others, and some countries do not yet have an archive. It was therefore considered that DwB would be best positioned to facilitate transfer of knowledge across countries, showcasing national examples that would demonstrate the usefulness of archives, the benefits of collaboration, and the lessons learned from past experiences. As part of this effort, two Regional Workshops were organised within the project lifespan.
The first regional workshop on “Microdata Access in European Countries: Cooperation between National Statistical Institutes and Social Data Archives", was hosted by the Statistical Office of the Republic of Slovenia (SORS), Ljubljana, Slovenia on 24 and 25 April 2013 and noticeably chaired all along the different sessions by the Director General of SORS.
Focusing on 15 countries mainly in the Eastern Europe area (Albania, Bosnia and Herzegovina, Bulgaria, Croatia, Czech Republic, Estonia, Hungary, Kosovo, Latvia, Lithuania, Poland, Romania, Serbia, Slovak Republic, Slovenia) this workshop aimed to bring together representatives of National Statistical Institutes and social science data archives to discuss current arrangements and conditions for researchers’ access to data, activities and initiatives of existing data archives, best practices, and potential opportunities for enhanced cooperation that may benefit all stakeholders. In particular, it was the opportunity to promote closer cooperation and a better sharing-out of the administrative and financial burden (for e.g. metadata production, anonymisation, accreditation, dissemination and providing support to researchers in general), which is increasingly higher. Through a series of presentations of both general overviews and particular case studies, and a working group session on the threats and opportunities for enhanced collaboration, the workshop intended to raise participants’ awareness about these needs and related issues in the broader European context.
The workshop gathered over 50 representatives from National Statistical Institutes, data archives and research and higher education institutions of the targeted Eastern European countries. Interestingly, almost all of the contacted NSIs have answered the DwB invitation positively and attended the workshop.
In addition, it was the opportunity for representatives of the FP7-funded SERSCIDA (Support for Establishment of National/Regional Social Sciences Data Archives, http://www.serscida.eu/en/) project to give an overview of the project and its objectives, which were of particular interest in this region. The SERSCIDA project was active in countries in which there was no data archive and was part of the effort for consolidating CESSDA within Europe. It constituted an important partner in the joint effort to facilitate cooperation and transfer knowledge and experience from the European countries that had a long experience with data archives, to those that did not.
The second regional workshop (RW2) on “Microdata Access in European Countries: Cooperation between National Statistical Institutes and Social Data Archives" was hosted by the National Centre for Social Research (EKKE), Athens, Greece, on 16 and 17 October 2014. It was introduced by the Director General of the Greek National Statistical Institute and by the president of EKKE. The final panel session was also chaired by the Greek NSI.
While initial plans were to focus only on Southern European countries, it was decided to extend the invitation to countries located anywhere in Europe, in which there were no DwB partners. This was meant to maximize outreach and potential impact of the workshop, leveraging the role of DwB as a catalyst of enhanced cooperation efforts throughout Europe. In continuity with a policy already adopted for the first Regional Workshop, it was also decided to invite a limited number of additional countries to the extent that they could provide inspiring examples of durable and successful cooperation. As a result of these decisions, the countries invited were: Cyprus, Greece, Italy, Malta, Portugal, Spain (Southern European countries); Austria, Belgium, Iceland, Ireland, Luxembourg (countries with no DwB partners); Germany, the Netherlands and Switzerland (examples).
The workshop gathered 40 representatives of NSIs, DAs, Ministries of Research and research and higher education institutions of the targeted countries. Interestingly enough, the number of existing data archives for social sciences in this region is rather low; among the DAs that do exist, not all are currently CESSDA members, though some are in the process of negotiating membership and others were members in the past, and are still considered as important partners by CESSDA. It is for this reason that the invitation was extended to some representatives of universities and research institutions (Belgium for example) and other types of institutions (e.g. Ministry for Science, Research & Economy in the Austrian case). Over the two days, presentations focused on access to microdata for research and cooperation between statisticians and the research community, through a mix of general overviews on Europe (and the target region) and country-based case studies. With these contents, DwB intended to raise participants’ awareness about these needs and related issues in the broader European context. For that purpose, RW2 also included a working group session on the opportunities for enhanced collaboration benefitting from the recent potential boost of H2020.
Both regional workshops were the occasion to allow the NSI and the Archive from one country to meet and discuss for the first time, which proved quite efficient for further cooperation as shown by the results of the 2 surveys that were conducted some time after the workshop to estimate the impact of the workshops (see the Impact section). The format also proved to be quite fruitful, offering the opportunity for the participants from both communities to meet and discuss practices at this level in a quite open and informal context.
* Towards a Formal Cooperation Agreement between the ESS and CESSDA *
These activities also provided a solid basis for discussing how to establish cooperation at European level involving the whole European Statistical System and the whole CESSDA, with the objective of a formal agreement between Eurostat and CESSDA, now built as a legal entity under the Norwegian law as CESSDA-AS to be later transformed into an ERIC with the European legal status. All along the duration of the project, there were regular discussions with Eurostat on the possible basis for such cooperation. A dedicated workshop was organised at Eurostat to present to a wide audience made of Eurostat staff from different departments the first outputs of DwB. Eurostat has continuously shown high interest for all DwB activities, participating to many workshops and conferences. The DwB coordinator was regularly invited to the annual Working Group on Statistical Confidentiality (WGSC) participating in the discussions on the revision of the Commission regulation for researchers' access to the European microdata (published in 2014) and other issues, conveying the archives and research perspective and presenting DwB outputs and proposals. As a result, the ESCOS proposal was presented by Eurostat at one of the DGINS conference. Finally, a proposal for a formal cooperation with CESSDA (when set up as an ERIC) was proposed by Eurostat and accepted by the Directors General in September 2015, a few months after the closure of the DwB project.
ENLARGING COOPERATION - RESULT 4: Building a Platform for Discussion at European Level: the European Data Access Forums (EDAF)
While the other activities and events targeted specifically one or two communities, the European Data Access Forums (EDAFs) aimed at involving in discussion on access to official microdata all the stakeholders at European level. The EDAFs were meant to build a European platform for discussing solutions to overcome the barriers to research using government data, both national and European, across the ERA. The ambition was that the Forum would be established in the long term as a biennial European event gathering the different communities: the Data Archives, National Statistical Offices, other members of the European Statistical System and the research community, also involving in the discussions other important stakeholders such as research funding councils and the authorities in charge of privacy protection and data security.
In that perspective, DwB organised biennial EDAFs, hosted in Luxembourg, in the Jean Monnet building of the European Commission with a strong support from Eurostat. All archives and NSIs as well as many other institutions from the different communities - including researchers - were invited and attended these 2 forums. Current practices, new tools and developments at national and European level were at the core of the EDAF programmes; which mixed presentations, demos and panel sessions to foster common knowledge and exchanges on best practices, projects and cooperation between the different communities.
The 1st EDAF was held on 27 and 28 March 2012. This Forum focused mainly on the legal frameworks and the research accreditation process for transnational access, also covering other important issues such as technical and methodological issues about remote access for confidential microdata and metadata standards. It was the opportunity to present the first outputs of the DwB project, recent changes and projects regarding legal frameworks and accreditation in participating organizations and countries, challenges for researchers as well as to share information in the context of the ongoing revision of the European Commission Regulation (EC) N°831/2002 concerning access to European microdata. Posters, demos and workshops offered more possibility for presentations and discussions making this event a European Forum for data producers, data providers and data users.
The Forum gathered over 120 participants from the different communities in equal proportion, also involving in the discussions other important stakeholders such as the national data protection commissioners and research funding councils. Attendance included: most National Statistical Institutes and some National Statistical Administrations, Eurostat, Archives, Researchers, European and International Organisations & Institutions, Research Data Centres, Central Banks, Privacy Protection Authorities, Research Agencies & Authorities as well as other EU-funded projects.
The 2nd EDAF was held on 24 and 25 March 2015. For this second edition, the programme was enlarged to cover all aspects of access to official microdata, legal issues and procedures for access with a focus on transnational access to confidential microdata, methodological issues regarding statistical disclosure control, technical solutions for allowing access while protecting confidentiality, data discovery and metadata issues that are crucial particularly for trans-border access. The focus was on changes that had occurred over the past 4 years at both national and European level; with new developments regarding access to administrative records and registers such as tax data and medico-administrative data, remote access solutions, initiatives to build cooperation between Research Data Centres to facilitate distributed access to confidential microdata, discovery tools and new portals. One important dimension of the programme was also to make the bridge with current and parallel developments in the health sector, particularly regarding access to medico-administrative data and the building of health portals. The programme mixed major findings and proposals from DwB, including the ESCOS and the EuRAN, and experiences or new initiatives and projects from other actors. On the side of DwB, it notably allowed presenting the CIMES and MISSY tools that provide user-friendly metadata for national and European official microdata as well as demonstrating the Proof-of-concept for a European remote access network, based on real IT infrastructure set up by partners and a lively fictional scenario of a research project involving teams from Germany and France.
Interestingly, the first introductive session focused on the experience of cooperation between the 7 RDCs from 4 countries involved in providing transnational access over the 4 years of DwB and on the data users' experience. Two of the researchers, who got selected and supported by DwB as part of the TNA activities for using Secure Use Files from national microdata, participated in a panel session together with users of the Eurostat microdata. They presented their research and experience in accessing and working with the data, also highlighting their difficulties and needs; which was very positively received and provided a basis for the overall discussion on future perspectives.
The EDAF 2 was introduced by the Eurostat Deputy Director who chaired the first session and concluded by a panel session chaired by the director of the Eurostat Methodology department. Most importantly, Eurostat put forward the idea that the EDAF should continue after DwB with support from Eurostat and CESSDA.
The event gathered over 150 persons from the different communities at stake, also involving in the discussion other important stakeholders such as research funding councils and representatives from Ministries for Research & Education. Attendance included: most National Statistical Institutes and some National Statistical Administrations of the ERA, Eurostat, Archives, CESSDA, Researchers, European and International Organisations & Institutions (notably IASSIST and ICPSR), Research Data Centres, as well as Research Agencies & Authorities.
DwB RESULTS - OVERALL CONCLUSION
Over those four years, the proposals for a European Service Centre for Official Statistics and a European Remote Access Network have progressively been perceived as the complementary components of an overarching infrastructure in cooperation with CESSDA, which would allow articulating and integrating the various results of DwB into a coherent whole to prepare the vision for the future of transnational access to confidential data.
Overall, DwB has provided immediate transnational access to confidential microdata, and has developed products and tools immediately usable, pilots and prototypes to serve for future developments, recommendations and standards on practices and procedures to be adopted as well as overall proposals requiring more investments for future developments. Their potential impact can be evaluated along different dimensions: direct or indirect, immediate or with a longer perspective. This includes impact for the different stakeholders interested in access to official microdata within the ERA as well as the broader socio-economic impact and wider societal implications of the project. The important and continuous dissemination activities all along the project have widely contributed also to increase the impact of the project results.
IMPACT 1 - More Immediate Access to Official Microdata Despite Current Obstacles
The main objective of the project was to increase research access to the rich resources of the official microdata at both national and European level within the ERA; with a focus on transnational access to the so-called confidential microdata currently underused. As obstacles are multiple and substantial, a large part of the project was dedicated to build tools, propose standards and prepare proposals and agreements that will take time and investments to be implemented. Though much impact on access could not be expected in the short term, it is noticeable that the project has contributed either directly or indirectly to increase immediately access to these resources. The TNA WPs supported a number of research projects that required transnational access to multiple sources of data hosted in 7 RDCs from 4 countries (France, Germany, the UK and the Netherlands). Though the amount of access actually provided was less than initially planned - due to the obstacles in the legal frameworks and their interpretation and to the type of financial support that DwB could provide within this type of project - it is expected that the cooperation set up for these WPs between the 7 RDCs will continue after the closure of the project. Actually, the RDCs have agreed on the interest of this cooperation and a dedicated section of the DwB website was revamped in that perspective: while advertising that financial support is no longer possible, it provides the information on the different RDCs data sources and procedures that the RDCs will keep updating.
Immediate and direct impact on transnational access came also from the opening of the transnational access to IAB Secure Use Files from UKDA, with a running connection set up within the framework of the WP4 pilot for a European Remote Access Network (EuRAN), based on a RDC-in-RDC approach. This allows researchers in UK to access IAB data without having to travel to Germany, thus facilitating and hopefully increasing transnational access to these longitudinal data linking different administrative basis on employers and employees, which are increasingly used for many cutting-edge comparative research projects. Researchers selected through the DwB calls were able to start using this running connection at the end of the project. The connection will of course be maintained as IAB and UKDA have agreed on the share of the costs and it is expected more researchers will therefore be using it after DwB. In the meantime, similar connections are now in discussion between IAB and CASD as well as between IAB and CBS; which should also concretely contribute to more transnational access after the official closure of DwB.
Indirect impact is also expected from the training sessions on the Eurostat microdata organised within the lifespan of DwB. A large share of the selected participants had research projects in preparation and should therefore be more in capacity to access and use these data. All participants were also made more aware of the potential of other sources as well as of procedures for confidential microdata, which should also indirectly increase their actual use of official microdata within the ERA.
IMPACT 2 - Tools and Products Available for the Public, Directly Facilitating Discovery, Access and Use of Official Microdata
First, DwB has immediately facilitated OS data discovery across borders with two metadata bases now available for the public, thus concretely increasing the potentiality of these data for research.
CIMES offers a single point of access to basic metadata in English on a large set of main national official microdata for EU and associated countries (currently more than 2000 together with information on the different type of files available, i.e. PUF, SUF, ScUF) as well as procedures and links with providers for more detailed metadata and for actually accessing the data. The CIMES system was made available for public use as of March 2015 and can be accessed at http://cimes.casd.eu. This immediately and concretely enhances data discoverability for the researchers, a crucial issue for transnational use of this rich source of knowledge.
MISSY was designed for facilitating researchers' use of the integrated European microdata (Eurostat) and now provides more structured and user-friendly metadata down to the variable level. The MISSY system went online in January 2015 and can be fully accessed by the public at http://www.gesis.org/missy/eu/missy-home. It currently includes the following integrated European microdata series: European Labour Force Survey, European Statistics on Income and Living Conditions, Structure of Earnings Survey, Community Innovation Statistics and Adult Education Survey. The impact is both on discovery and easier usage of microdata that should ultimately increase the use of the Eurostat microdata.
The setups and microdata tools (routines) provided through MISSY to complement these metadata make these European official microdata far more accessible and usable to researchers. The “set up files” and routines aid Eurostat data users with two types of services by (1) helping them to operationalize common social scientific concepts, scales or indicators or (2) assisting them in restructuring data files. Together, these services become a valuable tool to explore the contents of the data. This guarantees that users obtain a better sense of the contents and quality of data and have an easier time in doing their actual data analysis.
Both metadata bases and tools were presented to the researchers at the different training sessions. Currently GESIS, CNRS-RQ and CED plan to continue the work on the MISSY, CIMES and the IECM systems respectively; either supported by their own budgets, or via future co-operations and projects. One option is to integrate them in the CESSDA work programme. First steps have been undertaken at the time of this report. CESSDA has started evaluating the DwB outputs and discussing which of them should be maintained as CESSDA services in the future.
Two other DwB products are dedicated to the legal frameworks and the accreditation procedures and were made available for the public on the DwB website.
Factsheets on national legal frameworks and accreditation procedures to official national microdata provide complementary and harmonised information in English, available on the DwB website and integrated in CIMES. It notably increases the information for the users also providing links with the national data providers' websites, proforma contracts and application forms; thus ultimately facilitating researcher's work prior to accessing the data.
The Legal Gateways Visualisation Tool addresses the data producers and the authorities involved in legal frameworks and accreditation. It immediately provides a better understanding of countries' individual positions with respect to transnational access (of interest for researchers and other data providers / producers). It may also allow following up on evolutions in national situations over time and may have a possible peer-pressure and broker effect for national data providers.
The SDC software for new masking techniques developed under WP11 should enhance the production by the OS producers of anonymised microdata, as the Public Use Files available for all publics also usable by researchers for data discovery and for teaching. The developed software for cell suppression and CTA could ultimately be used at RDCs to protect tables produced by researchers; thus contributing also to the future development of a EuRAN. Some of these packages were included in the ARGUS software, a well-known software for applying different kinds of SDC techniques to tabular and microdata. Besides, the binary files of all the software packages (RCTA package, ECTA package, Cell suppression package, R package on synthetic data generation, record linkage package) generated by the WP11 partners are in open access; while the source codes and documentation may be made available upon duly-justified request (without granting any sub-licensing rights to the requesting party). These should serve as a basis for a general advancement of knowledge through direct use and crowd-sourcing. In addition, the developed advanced record linkage methods for database integration and methods can be used to evaluate the worst-case scenario in data protection, and to integrate different databases using a supervised approach.
IMPACT 3 - Changes in Practices & Potential Adoption of Standards
A large part of the work undertaken through DwB was dedicated to discussing and finding agreement on standards and practices that would ease transnational access to official microdata in the ERA.
A number of deliverables provide recommendations, guidelines and roadmaps on the different dimensions involved in access to microdata, metadata standards, legal frameworks and their interpretation, accreditation procedures and practices, secure access systems, security norms, data linkage issues regarding anonymisation, output checking for confidential microdata provided via secure systems.
Some are complemented by reports on researchers experience and needs as the users' stories used for the RCUK Data for Discovery workshop that should serve as a basis to shape the future by notably providing solid case studies for data providers' elicitation re. data documentation and dissemination. In a similar way, the results of online surveys and workshops with researchers from RDCs on their experience and needs with remote access should serve as a guide on how to best balance the researchers', providers' and producers' requirements regarding the use of confidential microdata; thus ultimately increasing the usage of official microdata across borders. The Swiss NSI, which is currently investigating the possible setting-up of a RDC, was for instance interested in such results.
All these report were widely presented and discussed with NSI, OS providers and other stakeholders on many occasions at both DwB events (e.g. specific workshops on metadata or accreditation, the Regional Workshops gathering many NSIs, or the European Data Access Forums) and external workshops and conferences (European DDI User Conference, IASSIST conferences, WGSC Eurostat, UNECE/Eurostat workshops). It is therefore expected that they will serve as a reference for future improvements in the area; ultimately increasing the use of official microdata within the ERA. WP7 reports on metadata issues provide background information and highlights of relevance for facing upcoming challenges in the field of metadata. They contribute to a general advancement of knowledge in the field of metadata standards, by providing solid bases for the on-going discussion regarding the standards relevant to the documentation and dissemination of official datasets. WP8 outputs re. OS data Object Models provide a set of general requirements for Virtual research environment. The report on ISO27001 provides immediate guidance for setting up a security-compliant RDC, which will ultimately be a basis for building mutual trust between RDCs in the perspective of a EuRAN.
Immediate changes in practices can also be reported, some of them deriving from the tools described in the previous section.
Regarding metadata, CIMES and MISSY have made the OS data producers and data providers more aware of the need for more detailed, better structured and harmonized documentation in English. In some cases, they have also induced changes in practices. CIMES has been an incentive for the NSIs and OS data providers to put more information in English on their websites.
It is also hope that the unavailability of some metadata at study level for several Eurostat microdata (CIS and SES), in particular national quality reports, might result in documentation for these series that are currently missing some details on their implementation in some countries. We believe that by exposing such documentation gaps within the MISSY database, we might lead NSIs and the relevant Eurostat departments to reconsider their clearing policy on these documents.
A workshop organised by DwB in Lausanne, and attended by all NSIs, allowed intensively discussing the interest of these databases for future cooperation. Some NSIs were also interested in the CIMES tool for their internal needs (e.g. ONS). Quite importantly, Eurostat has shown a great interest in the use of MISSY, which is currently included in one of the item of the proposed collaboration between CESSDA-ERIC and the ESS in the Eurostat document approved by the NSIs DGs (DGINS Conference in Lisbon) in September 2015.
Another indirect impact can be seen in the fact that the Data Documentation Initiative Alliance, which is the governing body over DDI, the most important metadata standard for microdata in the social sciences, is taking keen interest in the work within DwB and particularly WP5. In fact, the work on the MISSY system by Thomas Bosch plays an important role in the development of the DDI-RDF Discovery Vocabulary.
Regarding the legal frameworks and their interpretation as well as the accreditation procedures and practices, the visualisation tool on the legal frameworks and their interpretation regarding transnational data access, but also the factsheets on national accreditation procedures, provide useful information to the producers and the authorities in charge of research accreditation on the state-of-play in the different European countries; an incentive for more harmonisation and better practices. It creates a sound peer pressure and incentives for the OS data producers and providers to better document (in English) their accreditation and data access procedures. It may also provide a set of best practices for data providers to realize that data access for research purpose (incl. for foreign researchers) is actually feasible and beneficial insofar as they may increase their impact on society. Therefore, an important indirect impact for both tools is that they may serve as guides for adopting best practices in transnational data access for research purpose. An immediate example of changes in practices in that domain that is directly linked to the DwB work is the Hungarian case (presented during the EDAF 2), who adopted the DwB standard for changing its accreditation procedure and could potentially serve for other countries.
In the same area, some DwB activities have indirectly driven changes in practices. The TNA WPs was a first experience for 7 RDCs to discuss common adoption of harmonized procedures and forms driven by the need of the common calls for research projects requiring access to their confidential microdata. Of noticeable interest is the indirect impact this activity had on the French accreditation procedure itself, where the authority in charge of researchers’ accreditation (Comité du secret statistique) agreed to adapt and speed up the procedure in the case of projects selected by DwB on the grounds that these research projects had been first reviewed by a Selection Panel (USP) and the different data providers; which is a concrete example of trust building. On the occasion of an invitation of the DwB coordinator to present the DwB proposal about transnational accreditation, further discussion indicated possible developments in the future for bilateral agreements in case of projects requiring accreditations in two countries.
The training sessions, which required the use of campus files actually not available for the Eurostat microdata, provided an interesting use case for indirect impact of the DwB activities. Eurostat and 14 NSIs accepted to provide such teaching files that were discussed and designed for the purpose of the practical sessions of the course. This has fed in the discussion within the ESS about the production of Public Files that should soon be available for LFS and the SILC (only transversal at the moment), also usable for teaching purposes.
DwB's impact on the general discussions of various audiences regarding transnational access to official microdata, notably at the European and international level among the ESS and the OECD, also includes an important contribution to a common terminology regarding anonymisation, types of files and secure systems. This is crucial to facilitate international discussions and mandatory for contractual negotiations, as it was demonstrated by the example of the UKDA/IAB MoU for setting up a transnational access to IAB microdata from UKDA. The work on a common terminology, which was internally conducted for the needs of the activity undertaken within DwB, was conveyed successfully within the OECD and Eurostat circles by several DwB (NSIs) partners. The same applies to the important notion of “circle of trust” that should serve as a basis for international agreements regarding (transnational) access and for which the IAB/UKDA template agreement set up within DwB can serve as a Proof of-Concept, also usable for further developments with other RDCs.
IMPACT 4 - From a Starting Community to an Advanced Community: Keeping the Momentum
All in all, consensus was made that one of the most important - yet intangible - achievements of DwB has been its capacity to act as a major broker between its stakeholder communities (data producers, data providers, users & research communities, funding bodies), internally as well as externally. Through its balanced partnership involving representatives of NSIs and Data Archives and the range of activities designed to foster cooperation, DwB has managed to promote the dialogue and make the bridge between and amongst these communities: from a starting and fragmented community of practitioners, it has contributed - either directly or indirectly, from an internal circle of cooperation to an external one - to the setting-up of a higher degree of coordination and networking, which are the conditions for building up an advanced community.
As already said, the very composition of the DwB partnership has played an important and direct role in bridging the data producers and data providers communities: by involving data archives and NSIs or NSAs directly in the work to be carried out in the project, it has allowed in many instances triggering or fostering discussion and cooperation at national level. Even if cooperation did pre-exist between the National Statistical Institute and the Data Archive, further integration of their links were explored between DwB partners as well as non-DwB partners in parallel with the DwB work and events (notably during the Regional workshops and the EDAF). In some cases, DwB was the occasion for these partners to give a joint presentation on their cooperation, which was seen as a quite positive experience.
This direct impact of DwB can also be measured from the expansion of this internal circle of cooperation to a broader one. The EDAFs and Regional Workshops, which primary objective was to enlarge the cooperation across Europe but also to trigger discussion at national level, were attended by all NSIs and archives. They are regarded as informal platforms for discussion and major brokers between the communities regarding data access for research purpose. They even triggered negotiation / discussion between DA and NSI in some countries. For instance, the first EDAF allowed several NSIs and DAs to meet for the first time and start exploring alternative ways to share their tasks and responsibilities for data access provision to researchers in order to make the best out of their respective budgetary constraints. The DwB PI was later asked to help facilitate this starting dialogue, which can be regarded as an indirect achievement.
Finally, the direct actions of DwB to promote enlarged dialogue between the two major European communities, namely the European Statistical System led by Eurostat and CESSDA, have contributed to the general consensus and will in both communities to formally explore the ways toward greater cooperation. A formal and indirect output in this respect is the discussion that is currently being prepared for a future collaboration (that could possibly take the form of a Memorandum of Understanding) between the ESS and CESSDA (once the latter will become an ERIC, which is expected in 2016), presented by Eurostat at the last DGINS Conference and approved by the Directors General of the participating NSIs.
Similarly, the positive impact of the DwB actions on the community of RDCs has expanded beyond the sole DwB circle. Directly, the work undertaken to coordinate the transnational access activities of DwB allowed building mutual trust between the 7 involved RDCs. They all realized the need to closely collaborate with an ultimate view to fostering transnational access to their data. They expressed their will to keep on getting (1) reputational benefits by sharing knowledge & experience, and (2) a demonstrated impact to their user communities.
Along the same line, the DwB staff visits had a complementary action on the starting cooperation links between RDCs. This activity allowed supporting the on-going trend regarding creation and developments of RDCs and the possible underlying RA solutions by providing an informal platform for in-depth discussions between practitioners about data provision for research.
As the time went, an increasing number of RDCs expressed their will to participate (either as host or visitors) and to maintain this activity beyond DwB as an informal network of RDCs for knowledge exchange and capacity building. This is an unexpected - yet positive - and indirect impact of the DwB activity that may result in greater incentives for continuing the work in e.g. a follow-up project for a network of RDCs working toward a European Remote Access system.
However, such cooperation links are, by definition, intangible and fragile: they need to be maintained and - if possible - institutionalized / formalised to remain sustainable. All parties involved - either project partners or external stakeholders - agreed that keeping the momentum of this achievement in a relatively short term is of the utmost importance, which would then allow devising the best way to maintain this umbrella in a longer run in order to maximize the impact this should have on the whole ERA. An impact that would ultimately lead to more data accessibility and discoverability for research purposes; thus more excellent science that is crucially needed for policies evaluation and for a better understanding of the societal challenges ahead.
IMPACT 5 - Laying the Foundations of the Infrastructure Needed for Transnational Access to Confidential Data
DwB has paved the way for the necessary developments toward the infrastructure needed for transnational access to confidential data by demonstrating its feasibility in a realistic horizon.
The documentation provided through the MISSY and CIMES systems (on EU integrated microdata and national data respectively) as well as the harmonized country-based factsheets on the national accreditation and access procedures are the first components of what could become a European Service Centre for Official Statistics.
Such an ESCOS, ideally set up as a sub-unit of CESSDA, would also provide the front office for a European Remote Access Network, envisioned as an IT and organisational backbone that would allow researchers to work with the confidential data seamlessly across borders. DwB has demonstrated the feasibility of both aspects of such a system: the pilot that led to a running RA solution between Germany and the UK shows the organisational feasibility, while the Proof-of-Concept with a real IT implementation of 3 nodes and a central node set up between three RDCs of Germany and France has demonstrated the IT feasibility of a real network approach.
Besides, the strong cooperation links and mutual trust, which have been built between the data providers and data producers communities across countries, are the corner stone that is crucially needed for the actual implementation of such an infrastructure in the future.
IMPACT 6 - Socio-economic Impacts
The success of DwB should also be measured with regard to its socio-economic impact and its wider societal implication.
By promoting enhanced cooperation across countries and communities, DwB has allowed outlining a potential win-win pan-european situation. By widening access to their data, Data Producers would get more studies about their data; thus enhancing their impact on society. Data Providers on the other hand could analyse their national structures in light of international knowledge while exploring the transferability of processes, legal agreements, workflows, IT know-hows and solutions. And an in-depth thinking to explore a better sharing-out of task, responsibilities and - consequently - budget between both communities at national and transnational level, as advocated by DwB, would allow them to provide enhanced services to data users while optimizing their economic models, which have often been facing budget cuts over the past years.
But its added value for innovation should not be underestimated either insofar as DwB has provided important contribution towards a brand new innovative system for trans-border access to secured data, which will require IT development and further negotiations on its organizational aspects. Its proposals for a European Service Center for OS data users and a European Remote Access Network, as well as the first steps made in that respect by implementing a running RDC-in-RDC RA solution and demonstrating the feasibility of a network approach, may potentially put Europe at the cutting edge: the ERA may become a laboratory for such a secure distributed network system for access to confidential microdata that would transcend today's national borders while matching the requirements for a high-level security and providing the guarantees required by the data producers and providers.
By promoting such a system, which has been positively welcomed by the various audience over the project lifespan, but also by providing immediate transnational access to national official microdata, DwB has laid the path toward excellent science that increasingly require transnational access to this rich resource of official microdata, particularly the highly-detailed ones. This would ultimately contribute to the Knowledge Society advocated in the Vision 2020 of the EU, for which an important issue is the availability of data (regulated in case of confidential ones) for enlightened decision-making and responses to a number of societal challenges.
DISSEMINATION AND PUBLIC OUTREACH ACTIVITIES
Overall, DwB external dissemination policy is regarded as highly effective, with: one book, 61 publications in peer-reviewed journals or proceedings of international events, 120 other types of dissemination activities (with 7 flyers / factsheets, 7 posters, 14 conferences/joint sessions or satellite events to international events/workshops organized, 92 presentations made during international events).
Out of these, WP11 which is dedicated to research activities non surprisingly was particularly productive in disseminating their results. Based on reports that were part of the deliverables, over 15 presentations at international scientific conferences were given. Moreover, 23 accepted publications in proceedings and peer-reviewed scientific journals were written.
It should also be noted that DwB had a broad international coverage, with annual participation in all international conferences of importance in the field of data access (IASSIST conferences, EDDI conferences, NTTS conferences, UNECE HLG BAS workshops, joint UNECESTAT workshops, Q-annual events, UNECE SDS workshops, PSD conferences, etc); while handing over its main messages to a well-balanced audience, including scientific communities, policy makers and the civil society as a whole.
Besides, a particular care was also paid as to keep close contacts with and presenting DwB outputs & proposals to senior-level stakeholders. Over its whole duration, DwB has been considered to be a valid interlocutor by external stakeholders, notably policy makers and European bodies: DwB representatives were regularly invited to discuss and present their findings in a number of strategic events (Eurostat's WGSC & Secure Data Exchange Expert workshop, OECD Expert Group for International Collaboration on Microdata Access, DGINS Conference, European Research Infrastructures for the Humanities & Social Sciences conference, CESSDA Service Providers Forum, etc.).
Finally, a particular attention had been paid over the whole project as to coordinating the work undertaken within DwB with other actions alike, such as other FP7 projects (e.g. DASISH, EuroREACH, ENGAGE, EUDAT, EuroRIs-Net, InGRID, SERSCIDA) or ESSnet initiatives (e.g. ESSnet DARA, ESSnet on SDC), in order to avoid advocating different solutions to similar issues. For instance, the DwB PI was a member of the Scientific Advisory Board of the FP7-funded InGRID project, to notably benefit from the lessons learnt within DwB and allow cross-fertilisation.
After those four years, DwB has gained international recognition for its positive impact on the discussion held in the field of official microdata access. On many occasions, representatives of the stakeholder communities even suggested registering the "DwB brand" as a possible communication vehicle to ensure a continuum in the actions that may be maintained beyond DwB (EDAF, Staff Visits, Training Sessions, etc.).
Beside the anecdote, this is a revealing detail of the concerns raised regarding the sustainability and handing-over of the project main outcomes. Keeping the momentum is crucial for a long-term implementation perspective, which raises legacy issues and will in any case require further developments
The overall proposals for a European Service Centre for Official Statistics (ESCOS) and a European Remote Access Network (EuRAN) provide the framework for such a long-term vision. The ESCOS, envisioned as the global specific service for researchers in the field of OS, ideally as a CESSDA sub-unit, should also provide the front office for the EuRAN that would work as a backbone for European research by allowing researchers to work together with confidential data sources across Europe. Besides, these may be of strong interest to other communities handling other data sources and facing the same challenges (such as the health sector).
First steps have been made within DwB, including pilots and prototypes. The EuRAN Pilot has allowed concretely setting up a RDC-in-RDC secured transnational connection between Germany and the UK, based on a MoU setting the underlying organizational principles. Complementarily, a Proof-of-Concept has demonstrated the IT feasibility of a transnational RA system based on a network approach. Besides, the roadmap and pilot for a CESSDA Resource Discovery Portal should ultimately avoid information silos and improve data discoverability.
The question of their maintenance or further developments is therefore crucial. Some should hopefully be handed over to CESSDA and become part of its work plan. Others require further important investments for their construction and must include non-CESSDA actors - since a large share of RDCs are hosted by OS producers and there is also a need to bridge with other sectors, particularly the health sector that has similar needs for a EuRAN - as part of follow-up projects for instance. Whatever the case, maintaining an overall umbrella for the community built within DwB will be crucial in that prospect, to ensure a successful and sustainable implementation.
- Project Scientific Coordinator: Roxane SILBERMAN (CNRS, email@example.com)
- Project Manager: Tanguy LIBES (CNRS, firstname.lastname@example.org)
Grant agreement ID: 262608
1 May 2011
30 April 2015
€ 8 686 425,56
€ 6 493 017
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
Deliverables not available
Grant agreement ID: 262608
1 May 2011
30 April 2015
€ 8 686 425,56
€ 6 493 017
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS
Grant agreement ID: 262608
1 May 2011
30 April 2015
€ 8 686 425,56
€ 6 493 017
CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE CNRS