Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Extreme and Sustainable Graph Processing for Urgent Societal Challenges in Europe

Periodic Reporting for period 1 - Graph-Massivizer (Extreme and Sustainable Graph Processing for Urgent Societal Challenges in Europe)

Période du rapport: 2023-01-01 au 2024-06-30

The Graph-Massivizer project researches and develops a high-performance, scalable, gender-neutral, secure, and sustainable platform for multilingual information processing and reasoning based on the massive representation of extreme data (as general, knowledge and property graphs). They integrate patterns and store interlinked descriptions of objects, events, situations, concepts, and semantics. Graph-Massivizer supports the any-volume graph challenge by supporting up to billions of vertices and trillions of edges. It tackles the velocity graph challenge of dynamically changing topologies and proposes a novel viridescence graph challenge for sustainable processing at scale. Graph-Massivizer’s support for extreme data extends existing graph processing technological capabilities by orders of magnitude for at least one “v”-characteristic in four relevant use cases.

The project delivers the Graph-Massivizer toolkit of five open-source software (OSS) tools and FAIR graph datasets covering the sustainable lifecycle of processing extreme data as massive graphs. The tools focus on holistic (1) usability (starting from extreme multilingual data ingestion and massive graph creation), (2) automated intelligence (through analytics and reasoning), (3) performance modelling, and (4) environmental sustainability tradeoffs supported by credible data-driven evidence (5) across HPC systems and computing continuum. The automated operation based on the emerging serverless computing paradigm protected by state-of-the-art cybersecurity measures supports experienced and novice stakeholders from large and small organisations to capitalise on extreme data through massive graph programming and processing.
The Graph-Massivizer project researches and develops a holistic neuro-symbolic AI reasoning and inference platform for sustainable graph processing of extreme data based on five integrated tools.

Graph-Inceptor adopts knowledge graphs based on Resource Description Framework (RDF) to symbolically represent extreme data from the formal semantic ontological specification (based on Web Ontology Language (OWL)) of categories, properties, and relations between the concepts or entities. It combines the RDF Mapping Language (RML) with the formal ontology to convert heterogeneous streams of time series data into massive graphs with billions of nodes and trillions of edges, materialised within a graph database underneath, such as GraphDB for RDF graphs. For performance and compatibility, it offers the possibility of creating smaller in-memory virtual graphs, for example, integrating existing relational tabular databases and translating declarative SQL into semantic SPARQL queries for symbolic question answering.

Graph-Scrutinizer researches on top of the symbolic KG representation of data approximate and neural reasoning methods defined and encapsulated as basic graoh operations (BGO) to derive insights. BGO categories include classic graph creation methods (e.g. load, store, conversion), time-series to graph methods (e.g. natural visibility, quantile, optimal partition), traversal algorithms (e.g. depth and breadth-first search, shortest path search), computation of node properties (e.g. clustering coefficient, rank, degree, betweenness centrality), discovery of topological properties (e.g. diameter, hop-plot), sampling (e.g. node, edge, traversal-based) and summarisation algorithms, or neural network inferences. For the latter, it employs GNN for generating personalised graph embeddings for each use case problem that consider not only the structure but also the nodes’ and edges’ semantic properties (as labels) for training advanced ML inference methods with improved accuracy compared to traditional methods operating on tabular data.

Graph-Optimizer designs a BGO repository containing various categorised BGO implementations optimised for multiple shared and distributed memory platforms (e.g. CPU, GPU, FPGA). It further defines analytical symbolic models to predict the performance and energy consumption of BGOs, calibrated using microbenchmarks to the heterogeneous input graph and hardware characteristics. The calibrated symbolic models evaluate the best-performing BGO implementation for the given context.

Graph-Greenifier scales the graph analytics at the data centre level and beyond by combining open operational traces (e.g. SURF) with their measurements and the European Network of Transmission System Operators for Electricity (ENTSO-E) data. It evaluates and labels existing BGO workloads using sustainability metrics related to the effective use of green energy, production of greenhouse gas emissions, or total operational costs.

Graph-Choreographer employs a graph orchestration engine that specifies advanced data analytics based on BGO workflow compositions and selects the “best” implementations from the repository for serverless deployment, scheduling and execution on the computing continuum with optimised time and energy tradeoffs. Graph-Choreographer federates the infrastructure resources across the seven distributed Graph-Massivizer sites described in Section 1.4 applying state-of-the-art cybersecurity measures.
- SemOpenAlex RDF knowledge graph with over 30 billion triples on scientific publications and their associated entities, such as authors, institutions, journals, and concepts;
- “M100 ExaData” the largest scientific open data centre dataset, capturing operational metrics of the Marconi 100 supercomputer at CINECA;


- SYNTHVERSE: Synthetic financial data generation of extreme volumes of stocks and future commodities, adaptable to additional financial securities such as options, bonds, exchange-traded funds, mutual funds, and currencies
- Crystal Ball: Analysis of company-related events from past data, patterns identification in a common sequence, and prediction of the most likely following events by matching them
- BOSCH: Integration of traditional expert knowledge with sensor data for quality monitoring in manufacturing, combining KGs with time-series sensor data models to enhance explainability, accuracy, and flexibility in quality predictions, and provisioning of expert insights and real-time measurements for superior quality control
- GRAAFE: Continuous prediction of compute node failures in a high-performance computing system based on an anomaly prediction model that leverages the nodes’ physical layout integrated into the monitoring system with a continuous graph neural network deployment pipeline

- Graph-Massivizer toolkit extends the integrated metaphactory platform to operate with extreme amounts of streaming data with improved on-demand graph creation, intelligent graph analytics, optimised hardware configuration, workload prediction, and sustainable green orchestration of large BGO workflows at scale.
gm-marketing-poster.jpg
Mon livret 0 0