Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Science PlAtform Cloud Infrastructure for Outsize Usage Scenarios

Periodic Reporting for period 1 - SPACIOUS (Science PlAtform Cloud Infrastructure for Outsize Usage Scenarios)

Reporting period: 2024-01-01 to 2025-06-30

The SPACIOUS project aims to boost the scientific exploitation of ESA mission data by offering a new computational framework for astrophysical research that requires Big Data and Data Mining technologies to be employed. SPACIOUS will raise the competitiveness of the EU scientific community by increasing the Big Data and Data Mining expertise and providing easier access to these techniques.
To reach this goal SPACIOUS aims to combine existing technologies in a new Data Mining environment to enable the analysis of ESA missions’ data up to a level which is not possible without a dedicated infrastructure. The ambition of SPACIOUS is to open this platform to the community, while at the same time exploiting the data by developing astrophysical research, in collaboration with external scientific teams.
The main idea is to facilitate and open the analysis of ESA archives to the scientific community in new ways. SPACIOUS will make the difference by providing access to the data and, at the same time, providing the infrastructure, tools and methods to analyse these data. SPACIOUS will be the tool enabling Big Data analysis of ESA data products. Furthermore, we expect SPACIOUS to make a change by influencing the future way to treat data in space science.
A key part of the project is to produce scientific value (publications, enhanced data products and knowledge). We will face major scientific research data problems (challenges) whose analysis will be, per se, a key result of SPACIOUS. Four of these challenges will be defined internally (based on the exploitation of the massive Gaia and Euclid datasets) and used to drive the design requirements of the system, serving as test cases and technology demonstrators of the Big Data Analytics Framework (BDAF). In a second phase we will open the system to the community through an Open Call for proposals and will select the most relevant (both scientifically and technically) challenges submitted for their implementation.
The first Milestone achieved was the celebration of the Kick-Off Meeting in Barcelona (https://indico.icc.ub.edu/e/SPACIOUSKOM(opens in new window) January 2024). The External Advisory Board (EAB) and the Resource Allocation Committee (RAC) have been appointed. A close collaboration with ESA DataLabs Project is ongoing https://datalabs.esa.int/(opens in new window). All personnel have been hired. For the Commercial Cloud developments, the definition of the requirements and the procurement of the resources have taken place, and the resources are already available.
Initial implementation of the SPACIOUS platform (BDAF) ready and being deployed at UK platforms, BSC and Google cloud. The WP2 development scientists are testing it and ensuring that its characteristics and performance match the needs of the scientific users. The four internal challenges are being revised and adapted to the BDAF philosophy. Challenge #1 and 2 are ready and being tested (M5, M9). Open Call to the Community just issued (M11).
WP3 has obtained, and it is now able to guarantee, the resources provision at BSC based on a preliminary requirements collection: from April 2024 to April 2027, we will have for the storage: 512 TB in 2024, 768 TB 2025 and 1024 TB 2026 and for the computing: 5 virtual machines with flexible configuration deployed independently for SPACIOUS using OpenStack. The access and usage of the resources is renewable. The design of the platform has resulted in a production OpenStack cluster where multiple Apache Spark Workers, one Spark Master, and Zeppelin nodes can be deployed in a flexible way. BSC has also implemented a separate test environment and a series of tools for collaborative software development. Datasets for testing are ready at the platform (M3) and the environment is set-up ready for testing (M4). The Request Tracker system at BSC will be used as the official channel to collect detailed requirements for running tests and use cases.
WP4 has reviewed the existing GDAF prototype and redesigned it using containerized components deployed with Kubernetes. The new system includes an improved Jupyter-based notebook interface, additional parallelization APIs, and support for both batch HPC and interactive exploration. Within this WP we have also investigated off-the-shelf components for authentication, account management, and workflow definition, while enhancing the software stack with Euclid-specific tools like CLOE, and Cosmosis (for Euclid) and GaiaXPy (for Gaia). Researched backend mass storage solutions, including CephFS and directly attached SSDs, integrating them into pods. A Kubernetes cluster (Kubeflow/Kubeconfig) has been configured and deployed on several instances of private academic OpenStack Clouds and it is now in testing (M6, M7, M8, M12)
WP5 has been developing and adapting tools for integration in BDAF platform. GUASOM: A training code and visualisation tool that enables the processing of large volumes of data through an unsupervised clustering process using Self-Organising Maps (SOMs); GANDALF: A training code and visualisation tool designed for disentangling data structures, based on autoencoders and featuring an interactive graphical interface; and BDAVIS (Big DAta Visualisation System): an enhancement of the Gaia Archive Visualisation System, designed to provide interactive visualisation capabilities for large datasets. BDAVS enables data exploration both via a web browser and within a Python notebook. Tutorials and examples have been elaborated and are being adapted for the platform.
WP6 has delivered the first DMP as D6.1 and a generic metadata schema to apply to all SPACIOUS documents and research products.
Nowadays ESA is providing access to space missions data, but increasingly more users are not able to process and analyse all this data in their own computers. This will become especially critical in the coming years for the Gaia and Euclid missions, where the data published will reach very significant sizes that will be difficult (rather impossible) to handle with the current approaches. SPACIOUS will make the difference by providing access to the data and at the same time by providing easy access to the infrastructure, tools and methods to analyse them.
This will allow a wider part of the community to carry out extensive analysis of the ESA Space Science Archives, increasing the scientific impact of its missions. We expect SPACIOUS to be the tool enabling widespread Big Data analysis of ESA data products and to make a change by influencing the future way to treat data in space science.
SPACIOUS Kick off meeting group photo
SPACIOUS logo
My booklet 0 0