METHOD OF INCREASING THE EFFICIENCY OF DATA CLASSIFICATION AT THE ACCOUNT OF REDUCING THE CORRELATION OF THE SIGN

Authors

  • Svitlana Gavrylenko
  • Vadym Poltoratskyi

DOI:

https://doi.org/10.26906/SUNZ.2023.4.070

Keywords:

machine learning, data classification, data preprocessing, data correlation, computer networks, neural networks, ensemble classifiers, intrusion detection systems

Abstract

The object of the study is the process of identifying the state of the computer network. The subject of research is methods of identifying the state of computer networks. The purpose of the article is to increase the efficiency of detecting intrusions into computer networks by reducing the correlation of features. Methods used: methods of artificial intelligence, machine learning, methods of reducing the correlation of features. The following results were obtained: The effectiveness of using approaches that reduce data correlation was investigated: the method of principal components (PCA), independent components (ICA), L1 and L2 regularization, the method was justified for further research. According to the research results, a special procedure for reducing the correlation of the initial data is proposed. To evaluate the quality and efficiency of the proposed procedure, software models based on: Gradient Boosting, Random Forest, fully connected neural network (FCNN) and convolutional neural network (CNN) were developed. The UNSW-NB 15 set, which contains information on normal network functioning and during intrusions, was used as the source data. A comparative analysis of the quality and efficiency of the developed models was performed. Conclusions. The scientific novelty of the obtained results lies in the development of a method for detecting intrusions into computer networks, which differs from known methods by the presence of a special procedure for reducing the correlation of the output data, which made it possible to increase the efficiency of the identification process.

Downloads

Download data is not yet available.

References

Vergara, J.R., Estévez, P.A. A review of feature selection methods based on mutual information. Neural Comput & Applic , 2014, Vol.24, pp.175–186. https://doi.org/10.1007/s00521-013-1368-0

Hoque, N., Bhattacharyya, D.K., & Kalita, J.K. MIFS-ND: A mutual information-based feature selection method. Expert Syst. Appl., 2014, Vol.41, 6371-6385.https://www.researchgate.net/publication/262526444_MIFS-ND_A_mutual_informationbased_feature_selection_method

Smita Chormunge, Sudarson Jena. (). Correlation based feature selection with clustering for high dimensional data. Journal of Electrical Systems and Information Technology, 2018, Vol. 5 (3), pp.542-549. https://doi.org/10.1016/j.jesit.2017.06.004

Hall, M.A. (). Correlation-based feature selection of discrete and numeric class machine learning, Working paper. Hamilton, New Zealand: University of Waikato, Department of Computer Science., 2000, pp.1-10. https://hdl.handle.net/10289/1024

Krzysztof Michalak, Halina Kwasnicka. Correlation–based feature selection strategy in classification problems. Int. J. Appl. Math. Comput. Sci., 2006, Vol. 16(4), pp.503–511. https://bibliotekanauki.pl/articles/908379.pdf

Ibrahim, S.; Nazir, S.; Velastin, S.A. Feature Selection Using Correlation Analysis and Principal Component Analysis for Accurate Breast Cancer Diagnosis. J. Imaging, 2021, Vol.7, pp. 225-241. https://doi.org/10.3390/jimaging7110225

F. Vrins, J. A. Lee, M. Verleysen, V. Vigneron and C. Jutten, "Improving independent component analysis performances by variable selection," 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718), Toulouse, France, 2003, pp. 359-368, doi: https://10.1109/NNSP.2003.1318035 .

Ng. Andrew. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. Proceedings of the Twenty-First International Conference on Machine Learning, 2004, pp 78-84. https://dl.acm.org/doi/10.1145/1015330.1015435

Karen Garate-Escamilla A, Hassani AHE, Andres E, Classification models for heart disease prediction using feature selection and PCA, Informatics in Medicine Unlocked , 2020, pp. doi: https://doi.org/10.1016/j.imu.2020.100330

Mishra, S., Sarkar, U., Taraphder, S., Datta, S., Swain, D., & Saikhom, R. et al. Multivariate Statistical Data Analysis- Principal Component Analysis (PCA). International Journal of Livestock Research, 2017, Vol.7(5), pp. 60-78. http://doi.org/10.5455/ijlr.20170415115235

Published

2023-12-12

Most read articles by the same author(s)