Skip to main content

Applying Machine Learning to Cyber Risk Analysis and Mitigation

Periodic Reporting for period 1 - MALAGA (Applying Machine Learning to Cyber Risk Analysis and Mitigation)

Reporting period: 2019-09-01 to 2021-08-31

The main problem of the research was to find out a prediction model that could predict new cyber risks in the emerging area of connected autonomous vehicles. The movement towards smart vehicles in the vehicle industry has changed the traditional transportation environment risks. Regarding the hardware and software embedded in connected and smart vehicles, they can face any cyber risks in the cybersecurity domain. For example, the Android/iOS entertainment systems in the new smart cars mean that any cyber risk predicted for these operating systems is now a cyber risk for connected and smart vehicles. In this regard, we can consider that a general cyber risk prediction system can also predict the cyber risks of smart vehicles. There are also some cyber risks that are specially for smart vehicles. Considering this, the cyber risk prediction models should be examined to see if they can be trained to gain higher performance regarding the connected and autonomous vehicles cyber risks. In our MSCA-IF research project, we addressed the problem of finding both general cyber risk classification and prediction models and also connected and autonomous vehicles cyber risk prediction models.

Our presented models facilitate the entry of self-driving cars into the market by predicting their cyber risks. It is clear that no car can enter the transportation cycle without insurance. Insurers will not be able to insure new vehicles without recognizing the risks these vehicles may face. In one of the models we have presented and published in a reputable peer-reviewed journal, we identify cyber risks and quantify them. This helps the insurer to be able to insure the car and pave the way for it to enter the transportation cycle. Ultimately, self-driving cars will contribute to the sustainable development of societies by facilitating easier and cheaper transportation for all and reducing the number of accidents, as well as by helping to optimize time and fuel consumption in travels by creating intelligent public fleets. Our project is in line with the United Nations Sustainable Development Plan.

The overall objective of the research done was to develop cyber risk prediction models using machine learning algorithms. In the research path, the first objective was to detect hazards, threats, top events, and consequences. Then another objective was to classify different risks and quantify them so they could be used in insurance underwriting. The second objective was to build a full data-driven machine learning prediction model. After achieving these objectives and by investigating the predicted cyber risks, we concentrated on Phishing attacks, and another objective was defined to build a machine learning based model to discover phishing attacks.
The work was performed using a six-phase comprehensive methodology for data mining and machine learning projects called CRISP-DM (Cross Industrial Standard Process for Data Mining). The six phases were Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment. The first three months were for Business Understanding to Data Preparation phases. In these phases, the Marie Curie research fellow was finalising his studies on cyber risk in the Emerging Risk Group at UL and also understanding relevant datasets used in cyber risk studies. The researcher also planned relevant high-ranked conferences and workshops around the EU and some other countries to disseminate and exploit the results and collaboration. The emergence of COVID-19 changed all those plans to remote working from home. The COVID-19 regulations changed the work packages to an extent where most of the work was focused on algorithmic studies of cyber risk prediction models and publishing journal papers.

First, a model was built to identify all the hazards, threats, top-events, and consequences of cyber risks. This model also took a step further and quantified its results. It also classified cyber risks regarding their impact and frequency score. Then another model was built to predict future cyber risks. This model used advanced machine learning and text mining algorithms over the Common Vulnerabilities and Exposures dataset. From that model, phishing attacks were investigated further, and a feature selection based model was developed to identify phishing attacks. The first model was published in a peer-reviewed journal. The second model had a minor revision and is under publication process. The phishing model is submitted to another journal is in the review process.
The cyber risk classification model published in a peer-reviewed journal uses a method beyond the state of the art to use both advantages of new cyber risk models and recent data with the comprehensive bow-tie model and fault/event trees.

Our other model, which should be used for cyber risk prediction, goes beyond the state of the art by using a fully real-time data-driven approach to predict future cyber risks. It also scores and ranks the future cyber risks. The model is called CyRiPred and is fully automated. The model connects to the Common Vulnerabilities and Exposures feed and uses text mining algorithms to extract cyber risk topics out of vulnerabilities and exposures descriptions. This methodology is completely brand new and makes the prediction process completely automated for the first time up to our knowledge. The highlights of the work beyond the previous and current research and applications are: A data-driven cyber risk prediction model is presented. An explicit topic extraction algorithm is presented. A cybersecurity categorisation method based on the explicit topic extraction is presented.

The importance of phishing attacks was proven both by our previous research and it was also mentioned in literature. This made us go for phishing prediction research as our third cyber risk classification/prediction model. For the first time up to our knowledge, we developed a feature selection model using network structures to distinguish between legitimate and phishing platforms. The following are the work beyond state of the art for phishing cyber risk feature selection model: (1) First time using the network structure for analysing phishing data (2) Using community detection for analysing phishing data (3) Using scale-free analysis in social networks for analysing phishing data (4) Proper feature selection to use in intelligent phishing detection models by the use of network models.

Forecasting cyber risks has many social and economic consequences. In our research results, we facilitated positive impacts by providing a cyber risk quantification model and two cyber risk forecasting models:
(1) Facilitating the entry of emerging technology in self-driving cars by reducing cybersecurity threats;
(2) Enabling car insurance by quantifying cyber risk;
(3) Facilitating transportation for everyone;
(4) Safer transportation by facilitating the entry of self-driving cars;
(5) Securing cyberspace by anticipating cyber risks before the cyber event and thus improving the e-business environment;
(6) Securing the e-business environment by predicting cyber risks leads to the expansion of job opportunities for more people;
(7) Facilitating the advent of emerging connected and autonomous vehicles technology that will lead to shipment networks optimisation and change in logistics;
Graphical abstracts for three different models that are published or under publication.