COMPARISON OF MACHINE LEARNING METHODS FOR A DIABETES PREDICTION INFORMATION SYSTEM
DOI:
https://doi.org/10.26906/SUNZ.2021.4.073Keywords:
machine learning, data mining, neural network, diabetes prediction information system, logistic regression, decision treeAbstract
Diabetesisadiseaseforwhichthereisnopermanentcure;therefore,methodsandinformationsystemsarerequired for its early detection. This paper proposes an information system for predicting diabetes based on the use of data mining methods and machine learning algorithms. The paper discusses a number of machine learning methods such as random forest, AdaBoost algorithm, multilayer perceptron, neuro–like structure of Consecutive Geometric Transformations Models (CGTM), linear regression based on the stochastic gradient descent, generalized regression neural network and regression based on the support vector machine. The Pima Indian Diabetes dataset collected from the UCI machine learning repository was used in the research. The dataset contains information about 768 patients and their corresponding nine unique attributes: the number of times of pregnancy; plasma glucose concentration for two hours in an oral glucose tolerance test; diastolic blood pressure; the thickness of the folds of the skin of the triceps; the concentration of serum insulin for two hours; body mass index; a function of diabetes heredity; the age of a person; the result of a variable class (0 – no diabetes, 1 – a sick person). The research has been carried out to improve the prediction index based on the Recursive Feature Elimination method. It was found that the logistic regression model performed well in predicting diabetes. It has been shown that in order to use the created model to predict the likelihood of diabetes mellitus with an accuracy of 78%, it is necessary and sufficient to use such indicators of the patient's health status as the number of times of pregnancy, the concentration of glucose in the blood plasma during the oral glucose tolerance test, the BMI index and the result of the calculation of the heredity functions "Diabetes Pedigree Function".Downloads
References
Alam, Talha Mahboob, et al. A model for early prediction of diabetes. Informatics in Medicine Unlocked. 2019. No. 16. P. 100– 204.
Sisodia, Deepti, and Dilip Singh Sisodia. Prediction of diabetes using classification algorithms. Procedia computer science. 2008. No. 132. P. 1578–1585.
Tigga, Neha Prerna, and Shruti Garg. Prediction of type 2 diabetes using machine learning classification methods. Procedia Computer Science. 2020. No. 167. P. 706–716.
Diwani, Salim Amour, and Anael Sam. Diabetes forecasting using supervised learning techniques. Adv Comput Sci an Int J. 2014. No. 3. P. 10–18.
Zou, Quan, et al. Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics. 2018. No. 9. P. 515– 523.
Joshi R., Gupte R., Saravanan P. A Random Forest Approach for Predicting Online Buying Behavior of Indian Customers. Theoretical Economics Letters. 2018. No. 08. P. 448–456.
Wu X., Meng S. E–commerce customer churn prediction based on improved SMOTE and AdaBoost. 13th International Conference on Service Systems and Service Management (ICSSSM). Kunming. 2016. P. 1–5.
Cao Y., Miao Q–C., Liu J–C., Gao L. Advance and Prospects of AdaBoost Algorithm. Acta Automatica Sinica. 2013. Vol. 39. No. 6. P. 745–758.
Alomair O. A., Garrouch A. A. A general regression neural network model offers reliable prediction of CO2 minimum miscibility pressure. Journal Petrol Explor Prod Technol. 2016. No. 6. P. 351–365.
Tkachenko R., Izonin I. Model and Principles for the Implementation of Neural–Like Structures Based on Geometric Data Transformations. Advances in Computer Science for Engineering and Education. Springer International Publishing, Cham. 2019. P. 578–587.
Izonin I., Trostianchyn A. et al. The Combined Use of the Wiener Polynomial and SVM for Material Classification Task in Medical Implants Production. International Journal of Intelligent Systems and Applications. 2018. No. 10. P. 40–47.
Tepla T. L., Izonin I. V., Duriagina Z. A. et al. Alloys selection based of the supervised learning technique for design of biocompatible medical materials. Archives of Materials Science and Engineering. 2018. No. 1. P. 32–40.