USING A SEQUENCE OF PREPROCESSING METHODS IN VOICE IDENTIFICATION SYSTEMS

Authors

  • Maksym Bondarenko
  • Heorhii Ivashchenko

DOI:

https://doi.org/10.26906/SUNZ.2025.2.090

Keywords:

preprocessing methods, sequential application of methods, voice signals, voice identification systems, filtering, normalization, feature extraction, noise, spectral analysis

Abstract

Topicality. To date, voice identification is the most common method, but some technical limitations associated with environmental influences and voice variations require further research. With the development of technologies for creating fake audio recordings, there is a need for additional improvements to ensure reliable identification using existing voice biometrics methods, mainly by introducing voice data pre-processing methods. The goal of this work is to study the methods of pre-processing in human voice identification systems. The object of research is the voice signal filtering module in the user's voice identification system. The subject of research is the methods of pre-processing voice signals. Results. This article studies the effectiveness of applying a sequence of pre-processing methods in voice identification systems to reduce the impact of background noise and other distortions on recognition quality. A comparative analysis of the effectiveness of removing steady and dynamic noise was performed. Conclusion. The proposed approach to organizing the procedure of pre-processing voice signals involves the selection of an optimal sequence of methods, considering such parameters as recording quality, computing resources, and the required level of accuracy. The obtained results demonstrate that the sequential use of noise reduction and signal normalization methods increases the accuracy of voice identification.

Downloads

Download data is not yet available.

References

1. Mykhailichenko I., Ivashchenko H., Barkovska O., Liashenko O., “Application of Deep Neural Network for Real-Time Voice Command Recognition”, IEEE 3rd KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, pp. 1-4, doi:https://doi.org/10.1109/KhPIWeek57572.2022.9916473

2. Barnwal S.K., Gupta P. (2022), “Evaluation of AI System’s Voice Recognition Performance in Social Conversation”, 5th International Conference on Contemporary Computing and Informatics (IC3I), Uttar Pradesh, India, pp. 804-808, doi:https://doi.org/10.1109/IC3I56241.2022.10073242

3. Cornacchia M., Papa F., Sapio B. (2020), “User acceptance of voice biometrics in managing the physical access to a secure area of an international airport”, Technology Analysis & Strategic Management, Vol. 32(10), pp. 1236-1250, doi:https://doi.org/10.1080/09537325.2020.1758655

4. Li Z.L., Sajat M.S., Yusof Y., Fazea Y., Santana Purba H. (2021), “The Design and User Acceptance of IoT-based Access and Entrance Control System Using Voice Recognition”, Knowledge Management International Conference (KMICe), pp. 315-321, doi: https://doi.org/10.13140/RG.2.2.12611.73762

5. Kinkiri S., Keates S. (2020), “Speaker Identification: Variations of a Human voice”, 2020 International Conference on Advances in Computing and Communication Engineering (ICACCE), Las Vegas, NV, USA, 2020, pp. 1-4, doi:https://doi.org/10.1109/ICACCE49060.2020.9154998

6. Kambampati P., Rane S., Shoeb A., Dhannawat R. (2024), “PAYV - Payment Voice: A Platform using Voice Recognition to Enable Payment Transactions”, Asia Pacific Conference on Innovation in Technology (APCIT), MYSORE, India, 2024, pp. 1-6, doi: https://doi.org/10.1109/APCIT62007.2024.10673442

7. Papadopoulos P., Tsiartas A., Gibson J., Narayanan S. (2014), “A supervised signal-to-noise ratio estimation of speech signals”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp. 8237-8241, doi:https://doi.org/10.1109/ICASSP.2014.6855207

8. Mishra P., Singh S., Singh S.K., Dixit A. (2018), “Pre-Processing and Partition of Voice for Semi-Voice Authentication”, Fourth International Conference on Advances in Computing, Communication & Automation (ICACCA), Subang Jaya, Malaysia, pp. 1-5, doi: https://doi.org/10.1109/ICACCAF.2018.8776849

9. Awan S.N., Giovinco A., Owens J. (2012), “Effects of Vocal Intensity and Vowel Type on Cepstral Analysis of Voice”, Journal of voice, Vol. 26(5), pp. 15-20, doi: https://doi.org/10.1016/j.jvoice.2011.12.001

10. Lecomte I., Lever M., Boudy J., Tassy A. (1989), “Car noise processing for speech input”, International Conference on Acoustics, Speech, and Signal Processing, Glasgow, vol. 1, pp. 512-515, doi: https://doi.org/10.1109/ICASSP.1989.266476

11. Mamatov N., Niyozmatova N., Samijonov A. (2021), “Software for preprocessing voice signals”, Tashkent University of information technologies, Vol. 18, pp. 1-8, doi: https://doi.org/10.6703/IJASE.202103_18(1).006

12. Slimane A.B.; Zaid A.O. (2021), “Real-Time Fast Fourier Transform-Based Notch Filter for Single-Frequency Noise Cancellation Application to Electrocardiogram Signal Denoising”, Journal of Medical Signals & Sensors, Vol. 11(1), pp. 52-61, doi: https://doi.org/10.4103/jmss.JMSS_3_20

13. Zaw T.H., War N. (2017), “The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection”, 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh, pp. 1-5, doi: https://doi.org/10.1109/ICCITECHN.2017.8281794

14. Singh S., Rajan E. (2011), “MFCC VQ based speaker identification and its accuracy affecting factors”, International Journal of Computer Applications, Vol. 21(6), pp. 1-6, doi: https://doi.org/10.5120/2519-3423

15. Ahmed A., Khondkar M.J.A., Herrick A., Schuckers S., Imtiaz M.H. (2024), “Descriptor: voice pre-processing and quality assessment dataset”, IEEE Data Descriptions, Vol. 1, pp. 146-153, DOI: https://doi.org/10.1109/IEEEDATA.2024.3493798

16. Nagrani A., Chung J.S., Ziccerman A. (2017), “VoxCeleb: a large-scale apeaker identification dataset”, Sound, p. 6, doi:https://doi.org/10.21437/Interspeech.2017-950

Published

2025-06-19