Periodic Reporting for period 1 - SAPPAN (Sharing and Automation for Privacy Preserving Attack Neutralization)
Reporting period: 2019-05-01 to 2020-10-31
WP1: bi-weekly meetings, 3 general meetings, 2 steering committee meetings, quality management plan, data management plan, administrative deliverables, advisory board appointment, internal review procedure.
WP2: collaborated with domain experts and refined high-impact response and recovery use-cases, developed SAPPAN architecture and its functional specifications, collected privacy and visualization requirements, proposed evaluation methodology of the SAPPAN platform with KPIs.
WP3: created cybersecurity datasets containing both endpoint data and network traffic for several types of attacks, provided proof of concept implementations for Syslog and Netflow for fast and scalable cybersecurity data processing, developed high-precision algorithms for Domain Generation Algorithms (DGA) and Phishing detection based on deep learning, a visualization tool supporting detection algorithms, provided an approach to large-scale endpoint behaviour profiling, including a tool for visual analysis of the high number of host profiles analysis, development of an approach to URL abstraction that allows for anonymous data transit for the purposes of machine learning.
WP4: developed formal methodology for modelling cybersecurity response and recovery actions, demonstration tool to capture response and recovery actions from human experts, evaluated an incident similarity model recommending incident response and recovery actions to human experts, proposed a response and recovery framework with automated response and recovery actions for thresholding systems, defined a respective entity model for analytical provenance.
WP5: Initial results that federated learning can be beneficial for the accuracy of DGA classifiers.
WP6: Designed mock-up GUI from a perspective of an organizational SoC, implemented reconfigurable card-based dashboard using Elastic Search, set up a provenance tracking DB, implemented SAPPAN MISP connector.
WP7: SAPPAN website, hosted NG-SOC workshop with SOCCRATES project in ARES conference, in-house and public presentations regarding SAPPAN, dissemination through Cyberwatching project hub, 5 peer-reviewed publications.
We conducted research on multiple high-impact use-cases for local detection in SAPPAN, namely DGA detection, phishing detection, host and application profiling, and anomaly detection based on these profiles. For the DGA and phishing detection, we successfully evaluated a novel deep neural network trained to detect the DGA and Phishing domains with higher precision than contemporary approaches. We were even able to determine the different malware families of the malware, that generated the DGAs. We also provided a tool for large-scale host behavioral profiling including the tool for a visual exploration of the host profiles. Our current efforts focus on the possibility of the utilization of the process mining approaches to model application behavior and also on the approaches to the detection of the host and network behavioral profiling.
Privacy preserving anomaly/intrusion detection:
Our current effort is to focus on the possibilities to globally train a DGA classifier with multiple parties, however, either the locally private data sets or models are threatened to being disclosed by the latest inference attacks. We expect to mitigate or impede the success of known inference attacks in our sharing scenarios while preserving utility of the classifier, with the goal that sharing should remain beneficial with sufficient privacy guarantees.
Federated Threat Detection:
We planned to extend the capabilities of the newly developed local detection methods from WP3 by sharing local knowledge. Up until now we focused on the DGA and phishing detection context where we planned and conducted first experiments with the goal of increasing the detection capabilities by federation, e.g. using federated learning. Results show that including training data from different organizations can increase the capabilities of our classifiers.
Automating and Sharing Incident Response Handling:
While there exist publicly-available incident handling playbooks, these are too general and abstract. What we envision are more detailed and formalized playbooks, machine-readable and, ideally, machine-understandable. Our current effort revolves around formalization of incident handling playbooks using Semantic Media Wiki and gathering vocabularies. Our vision is that by sharing formalized response information will enable organizations to react to an attack swiftly and effectively.
While handling security alerts is time-consuming and requires significant expertise, many alerts generated by security monitoring systems are actually false - mistakes of those systems. Recognizing such false alerts and discarding them quickly is highly important for effective and efficient SOC operations. We've implemented prototypes for evaluating similarity of complex and context-rich alerts, which enable SOC personnel to easily find closely resembling ones. Our goal is to turn the prototypes to automated methods of identifying false alerts and reliably recommending incident handling actions to security analysts and incident response professionals.
At this time, the work on the dashboard is mostly of preparative nature for future improvements on the state of the art. We laid the foundations for tracking analytical provenance, which we envision to help in authoring and/or refining playbooks in the future. For the future, we envision our visualisations to provide greater transparency in regards to the uncertainty that exists in machine learning models and related data