Project ID: 654142
Funded under: H2020-EU.

Periodic Reporting for period 1 - EGI-Engage (Engaging the EGI Community towards an Open Science Commons)

Reporting period: 2015-03-01 to 2016-02-29

Summary of the context and overall objectives of the project

The EGI-Engage project (Engaging the Research Community towards an Open Science Commons) started in March 2015, co-funded by the European Commission for 30 months, as a collaborative effort involving more than 60 institutions in over 30 countries.

Project mission
EGI-Engage aims to accelerate the implementation of the Open Science Commons by expanding the capabilities of a European backbone of federated services for compute, storage, data, communication, knowledge and expertise, complementing community-specific capabilities.

Project objectives
• Objective 1: Ensure the continued coordination of the EGI Community in strategy and policy development, engagement, technical user support and operations of the federated infrastructure in Europe and worldwide.
• Objective 2: Evolve the EGI Solutions, related business models and access policies for different target groups aiming at an increased sustainability of these outside of project funding. The solutions will be offered to large and medium size RIs, small research communities, the long-tail of science, education, industry and SMEs.
• Objective 3: Offer and expand an e-Infrastructure Commons solution
• Objective 4: Prototype an open data platform and contribute to the implementation of the European Big Data Value.
• Objective 5: Promote the adoption of the current EGI services and extend them with new capabilities through user co-development

Project summary
Over the last decade, EGI, the European e-Infrastructure, has built a distributed computing and data infrastructure to support multi-disciplinary science. This e-Infrastructure has since delivered unprecedented data analysis capability to over 21,000 researchers from many disciplines by federating more than 350 data and compute centres worldwide. EGI builds on the European and national investments and relies on the expertise of - a not-for-profit foundation that provides coordination to the EGI Community, including user groups, participants in the EGI Council, and the other collaborating partners.
The mission of EGI-Engage is to accelerate the implementation of the Open Science Commons vision, where researchers from all disciplines have easy and open access to the innovative digital services, data, knowledge and expertise they need for collaborative and excellent research. The Open Science Commons is grounded on three pillars: the e-Infrastructure Commons, an ecosystem of services that constitute the foundation layer of distributed infrastructures; the Open Data Commons, where observations, results and applications are increasingly available for scientific research and for anyone to use and reuse; and the Knowledge Commons, in which communities have shared ownership of knowledge, participate in the co-development of software and are technically supported to exploit state-of-the-art digital services.
EGI-Engage expands the capabilities offered to scientists (e.g. improved cloud or data services) and the spectrum of its user base by engaging with large Research Infrastructures (RIs), the long-tail of science and industry/SMEs. The main engagement instrument is through a network of eight Competence Centres, where National Grid Initiatives (NGIs), user communities, technology and service providers join forces to collect requirements, integrate community-specific applications into state-of-the-art services, foster interoperability across e-Infrastructures, and evolve services through a user-centric development model. The project also coordinates the NGI efforts to support the long-tail of science by developing ad hoc access policies and by providing services and resources that will lower barriers and learning curves.
EGI-Engage will broaden the adoption of a federated identity management, will extend accounting to include new services and types of resources, and will provide tools for Service Level Agreements (SLA), service discovery and allocation in a federated environment. The EGI Federated Cloud and its operations are evolving to provide IaaS, PaaS and SaaS, and the HPC capacity and capabilities will be expanded by federating the access to distributed accelerated computing co-processors. Publication, use and reuse of open data will be facilitated.
EGI-Engage is evolving solutions and their related business models with approaches targeted at each user group for improved sustainability and integration with other infrastructures in Europe and worldwide. The project develops business relationships with industry and SMEs and provides an innovation space where general purpose compute and data services can be offered to develop big data technologies, applications and foster reuse of research data. The technical input to standards, policy and procedure developments, software and service innovation, business model innovation and know-how produced by the project will be offered to user groups, Research Infrastructures, industry/SMEs, service providers, funding agencies and decision/policy makers.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

"This section presents the work performed and the main outcomes achieved by project objective.

OBJECTIVE 1. Ensure the continued coordination of the EGI Community in strategy and policy development, engagement, technical user support and operations of the federated infrastructure in Europe and worldwide

- Strategy. The new EGI strategy was adopted by the EGI Council in May 2015 . It proposed a new vision, mission, strategic goals with objectives organised in five strategic themes: 1.) Engage and support user communities; 2.) Design, develop and deploy solutions; 3.) Serve, support and improve live (in production) services, 4.) Influence policies and 5.) Achieve sustainable future.
- Governance. The EGI governance formalized in 2010 with the creation of the EGI Foundation and the related statutes was reviewed, and requirements for improvement were collected. The result of this was the approval of a new governance model and the approval of the new statutes.
- The Open Science Commons vision, extensively discussed at the EGI Conference, May 2015, was endorsed in May 2015 by the European Council in the conclusions on "open, data-intensive and networked research".
- European Open Science Cloud . EGI participated to the EC consultation on the EOSC by providing input on the main challenges for Open Science and the ERA.
- Operations coordination. SA1 coordinated the daily running of services in the EGI Federation, comprising 325 centres in 58 countries and 5 integrated e-Infrastructures worldwide. In the first quarter of 2016, the capacity federated in EGI exceeded the thresholds of 650,000 logical CPUs (23.6% yearly relative increase compared to 8.13% of the previous year) and 500 PB of online and near-line storage federated worldwide. 343200 Virtual Machines (+79% yearly increase, 2.31 Million hours of CPU wall time consumed) were instantiated in the Federated Cloud from Jan 2015 to Jan 2016. The federated cloud, with its 21 providers of which one is commercial, is a new platform that started its production activities in May 2014.
- International user community. In February Q1 the estimated number of active users exceeded 46,000, of which 66% are from natural sciences; 6.5% from medical and health (an expanding sector) and 6.3% for Engineering and Technology. The average job rate per day exceeded for the first time since the beginning of production activities in 2004, 1.6 Million jobs/day, and the overall amount of CPU hours increased by 26.4% in the first year of the project.
- Security. The Security Incident Handling Procedure was updated and approved by the EGI OMB. This work exploited new possibilities in incident response required by the newly integrated technologies, primarily cloud IaaS. The EGI Security Threat Risk assessment has been updated with focus on the EGI Federated cloud and the changing EGI environment.

Objective 2 (O2). Evolve the EGI Solutions, related business models and access policies for different target groups aiming at an increased sustainability of these outside of project funding. The solutions
will be offered to large and medium size RIs, small research communities, the long-tail of science, education, industry and SMEs.

- A business engagement programme that outlines the areas and benefits for collaboration was formalised. Through the collective efforts of each partner, there are currently ~60 private organisations in a dedicated contact database that range from SMEs to large enterprises in role such as technology providers, brokers and consumers.
- Thematic solutions. EGI will develop business agreements with external thematic platform operators to enrich the current set of general-purpose solutions. Purpose of this programme is to provide greater visibility to Scientific Gateways, Virtual Research Environments, Virtual Labs and other products that provide indispensable added value services that are specific to a given research group.
- Service portfolio management. The Services and Solutions Board (SSB) was defined, approved and established by the Council as support structure to strengthen the governance of managing the EGI service and solution portfolios and to increase effectiveness.
- EGI marketplace. The concept of the EGI marketplace was defined, together with scenarios for allocating capacity to research communities in collaborations with pilot user communities.
- Certification. has now added FitSM – the lightweight standard for federated service management.
- Data Hub. The business model of a new service, the Data Hub, is being defined (see Objective 4 for more information). JRA2 is the activity that technically contributes to this objective through extension of EGI services with the Open Data Platform, which will enable all communities, both large and small, to easily access, share and publish open data.
- High-Throughput Data Analysis solution has been advanced through 8 new releases of the Unified Middleware Distribution (UMD).
- Federated Operations. The EGI Federation is constantly evolving, both in terms of services provided and service management processes and tools for federated IT service management across the federation. The EGI Federation is managed through a set of 17 security policies , 5 security procedures and 23 operational procedures, which are constantly reviewed and evolved over time .
- User community outreach and technical support. Tasks of the SA2 activity (SA2.1 Training and SA2.2 Technical User Support) helped EGI deepen existing collaborations and establish collaborations with new communities. The results are 27 collaborations with RIs/FETs, 11 projects/communities and numerous ‘long-tail users’, of which some represent research performing institutes, research labs, or individual scientists.

Objective 3 (O3). Offer and expand an e-Infrastructure Commons solution

- The evolution of the EGI e-Infrastructure Commons solution was ensured with the coordination of the innovation activities that concern the evolution of the EGI Core Infrastructure Platform, which is responsible for delivering the tools and services necessary to implement the federation fabric. The platform includes services for user Authentication and Authorization, the service configuration registry, accounting, monitoring and service level reporting and other component services required by federated service management processes and activities.
- New AAI architecture and prototyping of technical solutions. The requirements gathered from new research communities and the collaboration of the user groups engaged in the AARC project, allowed the definition of a new AAI architecture for EGI following the AARC blueprint.

Objective 4 (O4). Prototype an open data platform and contribute to the implementation of the
European Big Data Value

- Data Hub. The business model of a new service – the EGI Data Hub – is being defined. The service leverages the Open Data Platform being prototyped in WP4 and the needs of researchers who require access to large volume of third party open data for downstream analysis. The service offers to users the possibility of caching, replicating and on demand access to core resources, and to data providers the possibility to offload computation and data access to EGI, while still retaining full control of access and being informed about utilization.
- Open Data Platform use cases and the architecture were defined, and the preparation of the ODP prototype started. The development was focused on extending the ODP underlying technology – Onedata – with essential features for open data management including extended metadata support, snapshot creation and grouping of files into collections, to enable creation of the Open Data Platform prototype by M20.

Objective 5 (O5). Promote the adoption of the current EGI services and extend them with new
capabilities through user co-development

- Four editions of Inspired were published in April, July and October 2015 and, more recently, January 2016.
- During PY1 the WP6 activity established 8 Competence Centres (CCs) linked to 8 Research Infrastructures/communities: BBMRI, DARIAH, EISCAT-3D, ELIXIR, EPOS, MoBrain/INSTRUCT, LifeWatch, and disaster mitigation.
- Service Level Agreements. A new framework consisting of processes and documentation was put in place to ensure the establishment of Service Level Agreements between user communities (VOs) and providers of the EGI Federation as soon as testing activities are successfully completed and a new user community is ready to start the preparatory stage to become an active user group. The SLA framework is a ground-breaking activity that revolutionizes the processes of EGI in supporting resource-bound user groups.
- An open system of integrated e-Infrastructures and RIs. In QR1 of 2016 international research collaborations are being supported through a worldwide open infrastructure which includes besides 6 international e-Infrastructures fully integrated in the EGI federation (in Africa-Arabia, the Asia-pacific region, China, India, Canada and Latin America), and 7 research infrastructures of European and/or international relevance, of which many are either ESFRI projects or ESFRI landmarks in the ESFRI roadmap 2016. The integrated RIs already producing accounting data are: INSTRUCT via WeNMR, BBMRI, IceCube, CTA, KM3NeT, LOFAR and LSST.
- RIs in testing and development stage. Various RIs, mainly supported via EGI-Engage competence centres or other support initiatives of the EGI Community, are currently in testing and development stage, these are expected to progress with their service pre-production activities in PY2: DARIAH (ESFRI landmark), EISCAT-3D (ESFRI project), ELIXIR (ESFRI landmark), ELI (ESFRI landmark), EMSO (ESFRI landmark), EPOS (ESFRI project), LifeWatch (ESFRI landmark) and SKA (ESFRI landmark). For some of these, like SKA, the distributed computing and data infrastructure architectures are still to be defined, and the collaboration at this early stage aims at sharing processes, activities, best practices and solutions that have been proven to be mature and scalable by other international collaborations."

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

KPI No./Description/Value PY1/Target PY1
KPI.1.JRA2.OpenData /Number of open research datasets that can be published, discovered, used and reused by EGI applications/tools/5/0

The Open Data Platform is not yet release; however testing activities are in progress with various datasets. For biodiversity and LifeWatch the GBIF Spanish open data available from the Spanish GBIF node is available. For genetics, testing activities concerned the ASTD database, while for genomics the reference resources used are: Ensembl, Ensembl Human Genome and Ensembl Homo sapiens sequence indexes.
In the reporting period, the main focus of JRA2.1 has been on development of the Open Data Platform prototype, based on the requirements and use cases collected within the M4.1 milestone report. The main work was oriented on solving issues and adding new functionalities for the communities already involved in the evaluation and testing during the first 6 months of the project

KPI No. Description Value PY1 Target PY1
KPI.2.SA1.Integration /Number of RIs and e-Infrastructures integrated with EGI/15/9
EGI has MoUs with the following resource providers integrated with the EGI production infrastructure: Asia Pacific, Africa and Asia, Latin America, China, India and Ukraine. Open Science Grid in the US and Compute Canada have interoperation agreements and a collaboration in place aiming at increasing support to international research communities.
The following Virtual Organizations associated to the respective user communities generated accounting data: (1) INSTRUCT, (2), (3) icecube, (4), (5), (6) lofar, (7) LSST, (8) virgo, and (9)
The RIs that are in preparatory stage are: DARIAH (ESFRI landmark), EISCAT-3D (ESFRI project), ELI (landmark), ELIXIR (landmark), EMSO (landmark), EPOS (project), and SKA (landmark). Among these RIs, the EMSO-DEV project is preparing a collaboration agreement with EGI-Engage.

KPI No. Description Value PY1 Target PY1
KPI.3.SA1.Software/Number of new registered software items (applications for HTC computing) and VM appliances-19/62-50/50
During PY1, 19 software items in the form of new ported applications for the HTC solution have been added by communities to the AppDB library, while a total of 62 new cloud virtual appliances were registered. The fast growing number of new virtual appliances for cloud is proportional to the number of new use cases that were supported by the user support team and is an indication of the increasing interest in the federated cloud services.

KPI No. Description Value PY1 Target PY1
KPI.4.SA1.Cloud /Number of providers offering compute and storage capacity accessible through open standard interfaces/21/25
At the end of February 2016 there are 21 certified RCs in the FedCloud, providing either storage and computing cloud services. 3 additional RCs are under certification procedure at the time of writing. The ELIXIR compute platform and its federation through OCCI servers is still under testing and during EGI-Engage PY1 work has been done to simplify the procedure for cloud integration:, documentation for resource providers was re-structured and a virtual appliance that contains almost all the tools for the Fedcloud integration was prepared. A further expansion of the EGI federated cloud is expected with the integration of the LifeWatch cloud resources and the BBMRI ones.

KPI No. Description Value PY1 Target PY1
KPI.5.SA2.Users /Number of researchers served by EGI/46 250/40 000
o 11,000 robot users (the value is approximate. There are 157 robot certificates registered in 51 VOs)
o 35,250 are the active users with personal X.509 certificate
The number of users does not include those whose access is mediated by platforms hosted in the EGI federated cloud, that are responsible for providing access to the underlying EGI services. The number of robot certificate users is probably a conservative estimate, given that robot certificate users generated approximately 40% of the jobs on the production infrastructure, consuming approx. 32% of the used capacity. The long-tail platform will become production in PY2, additional users are expected to come in from there.
[XSEDE, 2014]: 7500 user accounts and 0.82 Mjobs reported in QR3 2014 (1.6 M jobs per day executed on average in EGI)

KPI No. Description Value PY1 Target PY1
KPI.6.JRA1.AAI /Number of users adopting federated IdP /0 /0
First communities adopting federated IdP should appear in PY2. In PY1 the activity was focused on design and integration, and has started pilots with selected communities: ELIXIR and DARIAH.

KPI No. Description Value PY1 Target PY1
KPI.7.SA2.Users /Number of new research communities served /18 /20
15 are the new international VOs registered in PY1, 3 are the national VOs (in Sweden, Romania and France). The VOs that are directly related to support activities funded in EGI-Engage are: training (SA2.1), DIRAC (SA2.2), access (JRA2), Dariah (SA2.6), ELIXIR (SA2.3). The other VOs represent research projects and communities supported by other national/international collaborations.

KPI No. Description Value PY1 Target PY1
KPI.8.SA1.Users /Number of VO SLAs established /3 /4
At the moment there are 3 VO SLAs finalized (DRIHM, WeNMR and BILS). SA1 is supporting the sites to implement the full technical support to the communities. Other 3 SLAs are closed to finalization. The additional SLAs in preparation are: to support the long tail of science in Nanotechnology, the Earth Observation data exploitation with Terradue platforms, the life science long tail of science, Human Brain Project and the EXTraS research project. The EMSO Research Infrastructure will follow after a MoU is finalized.

KPI No. Description Value PY1 Target PY1
KPI.9.NA2.Communication /Number of scientific publications supported by EGI /791 /Does not apply
This is the number of scientific research papers reported to be published with the support of EGI's affiliated NGIs and VOs in 2015, as reported by the NGIs and the VOs. It is an underestimation since not all NGIs/VOs collect this information at a local level, and/or the collection at a local level depends on input (self-reporting) from scientists. Some entries may be duplicated, if reported by multiple VOs/NGIs. We are changing the EGI acknowledgement policy so that papers can be more easily tracked via the OpenAIRE services.
Comparison: 507 publications in PRACE in 2013

KPI No. Description Value PY1 Target PY1
KPI.10.NA2.Communication /Number of relevant authorities informed of the policy paper on procurement /0 /5
The activity formally starts at M12 and the report will be available at M24. The goal for PM12 should be 0 as well as the measured value.

KPI No. Description Value PY1 Target PY1
KPI.11.SA1.Users /User satisfaction /4 over 5 /4
Two communities was involved so far in surveying feedback (MoBrain which expressed a user satisfaction of 4.5/5 and BILS with score 4.0/5). In PY2 all the communities with finalized SLAs will be involved in the feedback survey, as well as the users engaged through the long tail of science services.

KPI No. Description Value PY1 Target PY1
KPI.12.NA2.Industry /Number of services, demonstrators and project ideas running on EGI for SMEs and industry /36 /2
Number of business case entries in the dedicated requirements tracker tool (RT), which have been derived from almost 60 industry related contacts generated over the first project year, of which 34 are SMEs (~60% of the total contacts).
[PRACE, 2013]: PRACE can report 10 success stories of SMEs from 6 different countries benefiting not only from PRACE HPC resources but more importantly, from the know-how in the PRACE centres.

KPI No. Description Value PY1 Target PY1
KPI.13.SA2.Support /Number of delivered knowledge transfer events /21 /15
This includes: 12 tutorials at EGI conference and forum, 3 webinars, 3 federated cloud tutorials (with hands-on activities), 1 bioinformatics course with CHIPSTER on EGI Federated Cloud, 1 EGI intro tutorial at ENVRIplus week and 1 security tutorial during ISGC2015.

KPI No. Description Value PY1 Target PY1
KPI.14.SA1.Size /Number of compute available to international research communities and long tail of science /651 748 /NA
There has been an increase of about 20% during PY1. Still the main increases are visible in the HTC infrastructure, being also bigger than the cloud infrastructure.

KPI No. Description Value PY1 Target PY1
KPI.15.SA1.Size /Amount of storage available to international research communities and long tail of science (disk and tape) [PB] /disk: 264.18 PB
tape: 239.8 PB /NA
Storage has not increased as much as the cores (+5%) in the last year, but in the year before the start of the project there was a consistent increase of disk (+29%), which probably is correlated to the reduction in storage capacity investments in PY1.

KPI No. Description Value PY1 Target PY1
KPI.16.SA2.Support /Number of international support cases (for/with RIs, projects, industry) /38 /30
The SA2.1 (Training) and SA2.2 (Technical user support) helped EGI deepen existing collaborations and establish collaborations with new communities: with 27 RIs/FETs, 11 projects/communities and numerous ‘long-tail users’ (some representing institutes, some research labs, or individual scientists). The spread of the 38 RI-FET-community engagement cases (27+11) of PY1 across science disciplines is the following:
o Biological and medical sciences (incl. biodiversity, ecosystems): 47.4% (18/38)
o Earth sciences: 21% (8/38)
o Physics (incl. astro, particle, laser, etc.): 21.2% (8/38)
o Digital humanities (incl. languages): 5.2% (2/38)
o Agriculture: 2.6% (1/38)
o Nanotechnology: 2.6% (1/38)

KPI No. Description Value PY1 Target PY1
KPI.17.SA1.Size /Number of compute resources available to the long tail of science /16 144 /300
At the moment the long tail of science VO is supported by 4 providers, whose resources are not dedicated to the long tail of science users, but the LTOS VO has been configured to access them via opportunistic usage (resources are shared with other VOs). The amount of resources available for consumption at each time depends on the fluctuations in the computing workload

