Skip to main content

Sharing and Automation for Privacy Preserving Attack Neutralization

Periodic Reporting for period 1 - SAPPAN (Sharing and Automation for Privacy Preserving Attack Neutralization)

Reporting period: 2019-05-01 to 2020-10-31

The SAPPAN project aims to enable efficient protection of modern ICT infrastructures via advanced data acquisition, threat analysis, and privacy-aware sharing and distribution of threat intelligence aimed to dynamically support human operators in response and recovery actions. SAPPAN will develop a collaborative, federated, and scalable attack detection to support response activities and allow for timely responses to newly emerging threats supporting different privacy-levels. The project has a strong focus on improving cybersecurity for international public institutions and multinational companies in Europe. Improved threat detection techniques, response, and recovery will help these institutions and companies grow stronger and compete more effectively in the European market and beyond. End-users such as human analysts in the Security Operation Centres (SOC) can adopt SAPPAN solutions with reduced false alerts, advanced visualization, and automated support for response actions to mitigate cyberattacks. Furthermore, small-medium enterprises (SMEs) and organizations can use the SAPPAN platform to share data, machine learning models, threat intelligence for better detection of attacks and respective response. The overall objective of SAPPAN is to develop a platform for privacy-preserving data sharing, attack detection, and automation for response and recovery utilizing advanced data analysis, machine learning, privacy-enhancing technologies, and visualization techniques.
In reporting period 1, the main results so far are :
WP1: bi-weekly meetings, 3 general meetings, 2 steering committee meetings, quality management plan, data management plan, administrative deliverables, advisory board appointment, internal review procedure.
WP2: collaborated with domain experts and refined high-impact response and recovery use-cases, developed SAPPAN architecture and its functional specifications, collected privacy and visualization requirements, proposed evaluation methodology of the SAPPAN platform with KPIs.
WP3: created cybersecurity datasets containing both endpoint data and network traffic for several types of attacks, provided proof of concept implementations for Syslog and Netflow for fast and scalable cybersecurity data processing, developed high-precision algorithms for Domain Generation Algorithms (DGA) and Phishing detection based on deep learning, a visualization tool supporting detection algorithms, provided an approach to large-scale endpoint behaviour profiling, including a tool for visual analysis of the high number of host profiles analysis, development of an approach to URL abstraction that allows for anonymous data transit for the purposes of machine learning.
WP4: developed formal methodology for modelling cybersecurity response and recovery actions, demonstration tool to capture response and recovery actions from human experts, evaluated an incident similarity model recommending incident response and recovery actions to human experts, proposed a response and recovery framework with automated response and recovery actions for thresholding systems, defined a respective entity model for analytical provenance.
WP5: Initial results that federated learning can be beneficial for the accuracy of DGA classifiers.
WP6: Designed mock-up GUI from a perspective of an organizational SoC, implemented reconfigurable card-based dashboard using Elastic Search, set up a provenance tracking DB, implemented SAPPAN MISP connector.
WP7: SAPPAN website, hosted NG-SOC workshop with SOCCRATES project in ARES conference, in-house and public presentations regarding SAPPAN, dissemination through Cyberwatching project hub, 5 peer-reviewed publications.
Local Anomaly and Intrusion Detection:
We conducted research on multiple high-impact use-cases for local detection in SAPPAN, namely DGA detection, phishing detection, host and application profiling, and anomaly detection based on these profiles. For the DGA and phishing detection, we successfully evaluated a novel deep neural network trained to detect the DGA and Phishing domains with higher precision than contemporary approaches. We were even able to determine the different malware families of the malware, that generated the DGAs. We also provided a tool for large-scale host behavioral profiling including the tool for a visual exploration of the host profiles.  Our current efforts focus on the possibility of the utilization of the process mining approaches to model application behavior and also on the approaches to the detection of the host and network behavioral profiling.

Privacy preserving anomaly/intrusion detection:
Our current effort is to focus on the possibilities to globally train a DGA classifier with multiple parties, however, either the locally private data sets or models are threatened to being disclosed by the latest inference attacks. We expect to mitigate or impede the success of known inference attacks in our sharing scenarios while preserving utility of the classifier, with the goal that sharing should remain beneficial with sufficient privacy guarantees.

Federated Threat Detection:
We planned to extend the capabilities of the newly developed local detection methods from WP3 by sharing local knowledge. Up until now we focused on the DGA and phishing detection context where we planned and conducted first experiments with the goal of increasing the detection capabilities by federation, e.g. using federated learning. Results show that including training data from different organizations can increase the capabilities of our classifiers.

Automating and Sharing Incident Response Handling:
While there exist publicly-available incident handling playbooks, these are too general and abstract. What we envision are more detailed and formalized playbooks, machine-readable and, ideally, machine-understandable. Our current effort revolves around formalization of incident handling playbooks using Semantic Media Wiki and gathering vocabularies. Our vision is that by sharing formalized response information will enable organizations to react to an attack swiftly and effectively.

While handling security alerts is time-consuming and requires significant expertise, many alerts generated by security monitoring systems are actually false - mistakes of those systems. Recognizing such false alerts and discarding them quickly is highly important for effective and efficient SOC operations. We've implemented prototypes for evaluating similarity of complex and context-rich alerts, which enable SOC personnel to easily find closely resembling ones. Our goal is to turn the prototypes to automated methods of identifying false alerts and reliably recommending incident handling actions to security analysts and incident response professionals.

At this time, the work on the dashboard is mostly of preparative nature for future improvements on the state of the art. We laid the foundations for tracking analytical provenance, which we envision to help in authoring and/or refining playbooks in the future. For the future, we envision our visualisations to provide greater transparency in regards to the uncertainty that exists in machine learning models and related data
SAPPAN Concept