Periodic Reporting for period 2 - PrivacyForDataAI (A privacy layer to power all research and AI workflows)
Okres sprawozdawczy: 2025-01-01 do 2025-12-31
Key objectives include:
* Enabling access to sensitive data for research and innovation, eliminating privacy related barriers.
* Automating privacy protection by applying Differential Privacy (DP) principles to all data processing.
* Providing synthetic data versions to facilitate exploration and analysis without compromising the confidentiality of the original information.
* Supporting a broad range of analyses, from statistical analysis to machine learning and AI, while seamlessly integrating with existing workflows.
* Contributing to EU priorities in citizen rights protection and economic development, boosting trust in technology and technical progress.
Main technical achievements include:
* Unstructured Type Support: Sarus now supports free text columns, using pre-trained small language models.
* Flexibility for New Types: The Sarus transformer-based-SD-model allows easy support for new data types.
* DP-LLM-FT Module: Sarus has built a DP-LLM-FT module to fine-tune LLMs with DP guarantees without exposing training data.
* DP-RAG Module: A DP-RAG module for RAG queries with DP guarantees was built and open-sourced.
* Backbone for data manipulation: The delivery of the backbone is complete, maintaining user tracking, DP recursive compilation, pushes to external tables, and performance improvements.
* Qrlew: An open-source tool (Qrlew) for manipulating SQL queries to ensure DP has been developed.
* Advanced Types: Improved handling of ranges and possible values, with types inferred from data, validated and modifiable by the user. These types are propagated through SQL transforms.
* Docker Support: Support for pre-validated, cross-language, docker-based computations, which allow for encapsulating computation in any language as a Docker image and composing it with other Sarus operations.
* An automated approach to privacy protection: By automating the application of DP to all data processing, Sarus considerably reduces data leakage risks and the workload of compliance.
* A comprehensive solution for structured and unstructured data: Sarus's ability to handle diverse data types, including free text and multi-table databases, significantly broadens its potential use cases.
* Improved usability for data scientists: Sarus tools and SDKs allow data scientists to work with sensitive data using their usual methods and libraries.
* An open-source tool for the community: The open-sourcing of Qrlew and other resources promotes transparency and collaboration in data privacy.
Key needs for continued adoption and success of Sarus include:
* Continued research: Further research is needed to improve DP algorithms and the efficiency of data processing.
* Demonstration and validation: It is essential to continue to demonstrate the value of Sarus through concrete use cases and validate its compliance with regulations.
* Market access and funding: Further efforts are needed to establish business partnerships and secure funding to support the growth of Sarus.
* Regulatory and standards support: It is important to work with data protection authorities to establish clear regulatory and standards frameworks for the use of solutions such as Sarus.