THE METHODS OF DATA STORING OF A RECOMMENDATION SYSTEM BASED ON LINKED LISTS
DOI:
https://doi.org/10.26906/SUNZ.2022.2.079Keywords:
cybersecurity, cyber-attack, clustering, data analysis, web resources, network trafficAbstract
The goal of this work is to develop a system for detecting cyber threats based on the analysis of network traffic data of web resources using Python programming language and using machine learning methods. The object of research is the process of analyzing data from web resources in cybersecurity systems. The subject of research is the methods and algorithms of machine learning for the analysis of data from web resources. CSE-CIC-IDS2017 open data set was chosen to train the developed model of cyberattack detection, which contains the most modern common information attacks that correspond to the real world data, the main implemented attacks include FTP brute force, SSH brute force, DoS, Heartbleed, web attack , infiltration, botnet and DDoS. The developed software for detecting cyberattacks on websites consists of several modules, namely: a module for data processing pre-processing, a module for researching the feature space of network traffic and a module for using machine learning algorithms to search for cyberattacks. To solve the problem of selection of features in the implementation of this software, it was decided to choose a selection strategy based on the model using one of the ensemble methods of machine learning random forest. Model-based feature selection uses a machine learning algorithm with the teacher to calculate the importance of each feature, leaving only the most important ones. The following machine learning algorithms were chosen to train the model: naive Bayesian classifier, k-nearest neighbors, decision trees, support vector machine (SVM) with using Gaussian basis, and decision trees with acceleration (gradient boosting). Along with the training, a cross-check (with control) was performed on seven blocks at once, in order to obtain a more accurate assessment of the generalization ability of the model. The result of this work is the software implementation of machine learning methods to detect cyber-attacks on websites by identifying their features in network traffic, as well as comparing their effectiveness.Downloads
References
Chang J. (2021) “10 Cybersecurity Trends for 2022/2023: Latest Predictions You Should Know”, URL: https://financesonline.com/cybersecurity-trends/
Branch J. (2021). “What's in a Name? Metaphors and Cybersecurity”, International Organization, vol. 75, no. 1, pp. 39-70. doi:10.1017/S002081832000051X URL: https://www.cambridge.org/core/journals/international-organization/article/abs/
Ford V., Siraj A. (2014) “Applications of Machine Learning in Cyber Security”, ISCA 27th International Conference on Computer Applications in Industry and Engineering (CAINE-2014), held in New Orleans, LA, October 13-15, 2014.
Lewis M. (2017) “Rise of the machines: Machine Learning & its cyber security applications”, NCC Group Whitepape.
Sommer R., Paxson V. (2010) “Outside the Closed World: On Using Machine Learning for Network Intrusion Detection”, 2010 IEEE Symposium on Security and Privacy, 2010, pp. 305-316, doi: 10.1109/SP.2010.25. URL: https://ieeexplore.ieee.org/document/5504793
Burkov A. (2019) The Hundred-Page Machine Learning Book. – pp. 160.
Чио К., Фримэн Д. Машинное обучение и безопасность / пер. с анг. А. В. Снастина. – М.: ДМК Пресс, 2020. – 388 с.
Kostas K. (2018) “Anomaly Detection in Networks Using Machine Learning”. Research Proposal, march 2018, pp. 1-64.
Bhattacharyya D. K. (2013) “Network Anomaly Detection: A Machine Learning Perspective 1st Edition”, Chapman and Hall/CRC. – pp. 366.
Flach P. (2012) “Machine Learning: The Art and Science of Algorithms that Make Sense of Data. 1st edition”, Cambridge University Press. – pp. 416.
Орельен Ж. (2018) “Прикладное машинное обучение с помощью Scikit-Learn и TensorFlow: концепции, инструменты и техники для создания интеллектуальных систем”, Пер. с англ. – СпБ.: ООО "Альфа-книга”. – 688 с.
Canadian Institute for Cybersecurity (2017) “Intrusion Detection Evaluation Dataset (CSE-CIC-IDS2017)”, URL: https://www.unb.ca/cic/datasets/ids-2017.html
Breiman L. (2001) “Random Forests”, Machine Learning journal, Vol. 45, no. 1. – P. 5-32. – doi:10.1023/A:1010933404324, URL: https://link.springer.com/article/10.1023/A:1010933404324
Lutins E. “Ensemble Methods in Machine Learning: What are They and Why Use Them?”, Towards Data Science. URL: https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f
Sunil (2017) “Commonly used Machine Learning Algorithms (with Python and R Codes)”, Analytics Vidhya. URL: https://www.analyticsvidhya.com/blog /2017/09/common-machine-learning-algorithms/