METHOD OF INCREASING THE EFFICIENCY OF DATA CLASSIFICATION AT THE ACCOUNT OF REDUCING THE CORRELATION OF THE SIGN
DOI:
https://doi.org/10.26906/SUNZ.2023.4.070Keywords:
machine learning, data classification, data preprocessing, data correlation, computer networks, neural networks, ensemble classifiers, intrusion detection systemsAbstract
The object of the study is the process of identifying the state of the computer network. The subject of research is methods of identifying the state of computer networks. The purpose of the article is to increase the efficiency of detecting intrusions into computer networks by reducing the correlation of features. Methods used: methods of artificial intelligence, machine learning, methods of reducing the correlation of features. The following results were obtained: The effectiveness of using approaches that reduce data correlation was investigated: the method of principal components (PCA), independent components (ICA), L1 and L2 regularization, the method was justified for further research. According to the research results, a special procedure for reducing the correlation of the initial data is proposed. To evaluate the quality and efficiency of the proposed procedure, software models based on: Gradient Boosting, Random Forest, fully connected neural network (FCNN) and convolutional neural network (CNN) were developed. The UNSW-NB 15 set, which contains information on normal network functioning and during intrusions, was used as the source data. A comparative analysis of the quality and efficiency of the developed models was performed. Conclusions. The scientific novelty of the obtained results lies in the development of a method for detecting intrusions into computer networks, which differs from known methods by the presence of a special procedure for reducing the correlation of the output data, which made it possible to increase the efficiency of the identification process.Downloads
References
Vergara, J.R., Estévez, P.A. A review of feature selection methods based on mutual information. Neural Comput & Applic , 2014, Vol.24, pp.175–186. https://doi.org/10.1007/s00521-013-1368-0
Hoque, N., Bhattacharyya, D.K., & Kalita, J.K. MIFS-ND: A mutual information-based feature selection method. Expert Syst. Appl., 2014, Vol.41, 6371-6385.https://www.researchgate.net/publication/262526444_MIFS-ND_A_mutual_informationbased_feature_selection_method
Smita Chormunge, Sudarson Jena. (). Correlation based feature selection with clustering for high dimensional data. Journal of Electrical Systems and Information Technology, 2018, Vol. 5 (3), pp.542-549. https://doi.org/10.1016/j.jesit.2017.06.004
Hall, M.A. (). Correlation-based feature selection of discrete and numeric class machine learning, Working paper. Hamilton, New Zealand: University of Waikato, Department of Computer Science., 2000, pp.1-10. https://hdl.handle.net/10289/1024
Krzysztof Michalak, Halina Kwasnicka. Correlation–based feature selection strategy in classification problems. Int. J. Appl. Math. Comput. Sci., 2006, Vol. 16(4), pp.503–511. https://bibliotekanauki.pl/articles/908379.pdf
Ibrahim, S.; Nazir, S.; Velastin, S.A. Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis. J. Imaging, 2021, Vol.7, pp. 225-241. https://doi.org/10.3390/jimaging7110225
F. Vrins, J. A. Lee, M. Verleysen, V. Vigneron and C. Jutten, "Improving independent component analysis performances by variable selection," 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718), Toulouse, France, 2003, pp. 359-368, doi: https://10.1109/NNSP.2003.1318035 .
Ng. Andrew. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning, 2004, pp 78-84. https://dl.acm.org/doi/10.1145/1015330.1015435
Karen Garate-Escamilla A, Hassani AHE, Andres E, Classification models for heart disease prediction using feature selection and PCA, Informatics in Medicine Unlocked , 2020, pp. doi: https://doi.org/10.1016/j.imu.2020.100330
Mishra, S., Sarkar, U., Taraphder, S., Datta, S., Swain, D., & Saikhom, R. et al. Multivariate Statistical Data Analysis- Principal Component Analysis (PCA). International Journal of Livestock Research, 2017, Vol.7(5), pp. 60-78. http://doi.org/10.5455/ijlr.20170415115235