CORDIS - EU research results
CORDIS

A Formal Rule-Processing Engine for Privacy-Respecting Forensic Investigation

Final Report Summary - PRIVACY4FORENSICS (A Formal Rule-Processing Engine for Privacy-Respecting Forensic Investigation)

Forensic investigations can potentially lead to breaching the rights to privacy, especially when individuals are found to be innocent.

Hence,the main goal of this project was to identify best practice for carrying out privacy respecting forensics investigations and to develop tools that support privacy respecting forensics investigation activities.

This research project identified and focussed on three three major techniques for privacy respecting digital investigation,namely: (i) sequential release of information, (ii)use of formal rule processing methods, and (iii) use of machine learning. These three methods together with the research carried out are summarised below:

Sequential release of Personally Identifiable Information (PII) is based on the prior knowledge of the investigator. In this technique, an investigator is able to see PII only if she can build a direct link between existing evidences and requested data (i.e. using Link Analysis modelling). The main drawbacks of this technique is its total ignorance about permissions given in a search/seizure warrant as well as the risk of missing important data in relation to independent criminal activities. Moreover, the extra effort necessary for an investigator to prove connections between existing evidence and the required data mean that this technique not suitable for a real world implementation.
Formal rule processing models aim to capture the permissions given in a warrant and identify if an analyst request for accessing a PII should be granted or not. We have built a machinery that is able to receive requests, check if they are matched with permissions given in a warrant and allow or disallow an investigator;s request. However, the result of our survey with a number of Law Enforcement Agencies (LEAs) in the UK and EU who tested our machinery was not satisfying. They have commented on limited usage of such machinery in real-world investigations as it is difficult for an investigator to provide specific enough requests that can be processed by the system. Moreover, the time and efforts required for translating a request into its formal notation significantly limits its usage.
Machine learning to conduct an initial assessment of given raw data, identify potential evidence and present only relevant evidence to the examiner. This technique was identified as the most favorite option in our survey as well. This approach not only protects privacy by reducing the chance of an investigator accessing irrelevant data but may significantly reduce the investigation time. Considering expertise of the host organisation in artificial intelligence, researchers invested a lot of time and effort for investigating the suitability of this technique. We have built datasets of malicious and benign content suitable for Windows, Android, OSX, and Internet of Things (IoT) forensics investigation. The research then studied the suitability of different machine learning classification algorithms and deep learning techniques for detection of remnants of malicious activities. Our results indicated the suitability of Random Forests to accurately (98.3% accuracy) classify malicious Windows programs (Ransomware). In Android, SVM offers the most accurate classification (95.2%) followed by Random Forests (93.7%) in detection of malicious Android applications using criteria such as apps permissions and application source code. In OSX, Decision tree-J48 offered the best accuracy (96.62%) followed by our own Weighted- RBFSVM algorithm (91% accuracy). We have also conducted extensive research on the application of AI techniques in IoT forensics. Our proposed Two-layer Dimension Reduction and Two-Tier Classification (TDTC) algorithm outperformed all major classifiers namely Naive Bayes, Random Forest, SVM, and J48 Decision Trees in detection of malicious IoT applications and achieved 84.82% accuracy. Using patterns of the energy consumption as a feature, KNN could detect malicious IoT applications within 10 seconds of launching with 83.70% accuracy. We have investigated the performance of deep learning algorithms for detection of malicious IoT applications as well. A two-layer LSTM implementation, significantly outperformed other classification techniques and achieved 98.18% accuracy in detection of unseen malicious IoT applications. Moreover, our own deep learning based IoT malware detection approach could achieve 99.86% accuracy! Our results show the suitability of machine learning for privacy respecting digital investigation in different environments.

During project, the research fellow received good support from the host institute, ranging from providing an extensive network of contacts to access to first class researchers in the field. Regular meetings with the scientist in charge of the project, Prof. Sunil Vadera significantly shaped the project and its development. The overall support of the School for development of Cyber Security, made it possible to disseminate results of this project to a wider range of audience. During 24 month of the project, the list of outputs include: a book, 20 journal papers, 11 book chapters, 3 technical reports, and 4 conference papers. Moreover, we have built 4 forensics research datasets and developed 3 tools to support privacy respecting investigation in IoT, OSX, and Android environments. Moreover, we have built a privacy respecting threat hunting and Ransomware family detection tool for Windows Ransomware.