Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Safety Mechanisms for Artificial General Intelligence (AGI)

Objective

Artificial General Intelligence (AGI) represents AI systems with human-level cognitive abilities, capable of understanding, learning, and applying knowledge across a wide range of tasks and domains. While AGI holds immense potential to revolutionize industries, its imminent arrival also poses significant threats to society. Without proper safety mechanisms, AGI could cause unintended harm, be misused by malicious actors, or act autonomously in unpredictable and dangerous ways.

Our ambitious goal is to pioneer AGI safety by introducing a new paradigm grounded in cybersecurity principles. Current safety mechanisms—such as safeguards and alignment training—are proactive, serve only as the first line of defense, and are insufficient for the complex, autonomous nature of AGI. Stronger, more explicit mechanisms are essential to handle AGI use cases and mitigate their inherent risks.

The new paradigm employs a layered approach: beyond proactive safety, we propose adding two additional protective layers. These layers form the novel domains of active and reactive safety, both built upon a foundation of adversarial robustness. Active safety mechanisms, such as fail safes, enable us to detect and correct harmful thoughts made by the AGI in real time and explicitly, ensuring continuous and safe operation while enabling us to perform auditing when necessary. Reactive safety mechanisms, such as kill switches, serve as a last line of defense to contain or neutralize an AGI when all other measures fail. We also propose research into making these safety mechanisms immutable, preventing adversarial bypass.

Our preliminary data shows that these mechanisms are feasible and have high potential to outperform existing AI safety approaches. By fundamentally rethinking AI safety for the AGI era, this research aims to ensure we have robust safety mechanisms in place before AGI becomes a reality, while also enhancing the security and reliability of current AI systems in the interim.

Fields of science (EuroSciVoc)

CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.

You need to log in or register to use this function

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-ERC - HORIZON ERC Grants

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2025-STG

See all projects funded under this call

Host institution

BEN-GURION UNIVERSITY OF THE NEGEV
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 1 625 000,00
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

€ 1 625 000,00

Beneficiaries (1)

My booklet 0 0