Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Privacy Protection and Auditing for Foundation Models

Objective

Novel foundation models (FMs) like GPT, LLaMA, and Stable Diffusion are achieving exceptional performance across diverse tasks, generating high-quality text, images, and audio, and driving industry innovations. This progress stems from a shift in machine learning paradigm: instead of training task-specific models on curated datasets, FMs are first pretrained on vast, uncurated data to become strong general-purpose models, then adapted on smaller, domain-specific datasets for specific tasks.

However, FMs leak information from their training data. For example, recent studies reveal that they can re-create individual data points from their pretraining and adaptation datasets. This poses serious privacy risks when private data is involved. Preventing exposure requires developing methods to ensure privacy-preservation throughout FMs' lifecycle, from pretraining to deployment. To achieve this, our project will identify sources of privacy leakage, provide privacy guarantees over both pretraining and adaptation, and audit FMs to detect privacy violations. Therefore, we must overcome three major challenges: the limited understanding of privacy risks in FM pretraining, the lack of formal joint privacy guarantees for pretraining and adaptation, and the ineffectiveness of current privacy auditing methods.

The solution that we propose will establish a novel theoretical framework for privacy guarantees in FMs under the pretrain-adapt paradigm. Our fundamental innovations rely on the insight that, due to complex interdependencies between pretraining and adaptation data, different data points require individual levels of protection to prevent leakage. Advancing methods for identifying, achieving, and accounting for such individual guarantees will enable us to formally bound privacy leakage over both training stages and to detect violations. These innovations will allow society to benefit from technological advancements through FMs without compromising individuals' privacy.

Keywords

Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

HORIZON-ERC - HORIZON ERC Grants

See all projects funded under this funding scheme

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

(opens in new window) ERC-2025-STG

See all projects funded under this call

Host institution

CISPA - HELMHOLTZ-ZENTRUM FUR INFORMATIONSSICHERHEIT GGMBH
Net EU contribution

Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.

€ 1 499 973,00
Address
STUHLSATZENHAUS 5
66123 SAARBRUCKEN
Germany

See on map

Region
Saarland Saarland Regionalverband Saarbrücken
Activity type
Research Organisations
Links
Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

€ 1 499 973,00

Beneficiaries (1)

My booklet 0 0