Periodic Reporting for period 1 - MALAGA (Applying Machine Learning to Cyber Risk Analysis and Mitigation)
Reporting period: 2019-09-01 to 2021-08-31
Our presented models facilitate the entry of self-driving cars into the market by predicting their cyber risks. It is clear that no car can enter the transportation cycle without insurance. Insurers will not be able to insure new vehicles without recognizing the risks these vehicles may face. In one of the models we have presented and published in a reputable peer-reviewed journal, we identify cyber risks and quantify them. This helps the insurer to be able to insure the car and pave the way for it to enter the transportation cycle. Ultimately, self-driving cars will contribute to the sustainable development of societies by facilitating easier and cheaper transportation for all and reducing the number of accidents, as well as by helping to optimize time and fuel consumption in travels by creating intelligent public fleets. Our project is in line with the United Nations Sustainable Development Plan.
The overall objective of the research done was to develop cyber risk prediction models using machine learning algorithms. In the research path, the first objective was to detect hazards, threats, top events, and consequences. Then another objective was to classify different risks and quantify them so they could be used in insurance underwriting. The second objective was to build a full data-driven machine learning prediction model. After achieving these objectives and by investigating the predicted cyber risks, we concentrated on Phishing attacks, and another objective was defined to build a machine learning based model to discover phishing attacks.
First, a model was built to identify all the hazards, threats, top-events, and consequences of cyber risks. This model also took a step further and quantified its results. It also classified cyber risks regarding their impact and frequency score. Then another model was built to predict future cyber risks. This model used advanced machine learning and text mining algorithms over the Common Vulnerabilities and Exposures dataset. From that model, phishing attacks were investigated further, and a feature selection based model was developed to identify phishing attacks. The first model was published in a peer-reviewed journal. The second model had a minor revision and is under publication process. The phishing model is submitted to another journal is in the review process.
Our other model, which should be used for cyber risk prediction, goes beyond the state of the art by using a fully real-time data-driven approach to predict future cyber risks. It also scores and ranks the future cyber risks. The model is called CyRiPred and is fully automated. The model connects to the Common Vulnerabilities and Exposures feed and uses text mining algorithms to extract cyber risk topics out of vulnerabilities and exposures descriptions. This methodology is completely brand new and makes the prediction process completely automated for the first time up to our knowledge. The highlights of the work beyond the previous and current research and applications are: A data-driven cyber risk prediction model is presented. An explicit topic extraction algorithm is presented. A cybersecurity categorisation method based on the explicit topic extraction is presented.
The importance of phishing attacks was proven both by our previous research and it was also mentioned in literature. This made us go for phishing prediction research as our third cyber risk classification/prediction model. For the first time up to our knowledge, we developed a feature selection model using network structures to distinguish between legitimate and phishing platforms. The following are the work beyond state of the art for phishing cyber risk feature selection model: (1) First time using the network structure for analysing phishing data (2) Using community detection for analysing phishing data (3) Using scale-free analysis in social networks for analysing phishing data (4) Proper feature selection to use in intelligent phishing detection models by the use of network models.
Forecasting cyber risks has many social and economic consequences. In our research results, we facilitated positive impacts by providing a cyber risk quantification model and two cyber risk forecasting models:
(1) Facilitating the entry of emerging technology in self-driving cars by reducing cybersecurity threats;
(2) Enabling car insurance by quantifying cyber risk;
(3) Facilitating transportation for everyone;
(4) Safer transportation by facilitating the entry of self-driving cars;
(5) Securing cyberspace by anticipating cyber risks before the cyber event and thus improving the e-business environment;
(6) Securing the e-business environment by predicting cyber risks leads to the expansion of job opportunities for more people;
(7) Facilitating the advent of emerging connected and autonomous vehicles technology that will lead to shipment networks optimisation and change in logistics;