A STUDY OF THE ACCURACY OF BIMFORMING METHODS IN THE CONTEXT OF AN INCLUSIVE INTERNAL NAVIGATION SYSTEM
DOI:
https://doi.org/10.26906/SUNZ.2026.2.165Keywords:
inclusive navigation system, visual impairment, speech recognition, spatial filtering, beamforming, Delayand-Sum, Max-UDR, Max-SINR, dynamic noiseAbstract
Relevance. Voice control of elements in inclusive navigation systems is critical for ensuring the independence and safe mobility of people with visual impairments in public spaces, particularly in large retail spaces. However, existing speech-to-text (STT) systems face a significant drop in recognition accuracy due to the highly dynamic and non-stationary acoustic noise in supermarkets. The object of this study is audio stream preprocessing and spatial filtering (beamforming) in a voice control system under conditions of dynamic, non-stationary noise. The problem lies in the insufficient selectivity of standard audio signal processing algorithms under conditions of background noise in a store, which leads to a critical increase in the word error rate (WER) and makes the smart cart control system vulnerable. The objective of the article is to evaluate the impact of external factors (number, spatial topology of placement, and power level of acoustic noise sources) on the accuracy of spatial filtering (beamforming) methods for subsequent voice command recognition through computer simulation. As a result of the study, the acoustic environment and microphone array were simulated using the Pyroomacoustics library. A comparison was conducted between three methods: Delay-and-Sum (DAS), Max-UDR, and Max-SINR. The study showed that the Max-SINR algorithm provides the highest signal-to-noise ratio gain (Delta SNR from 7.9 to 9.1 dB) and is mathematically robust to changes in the distance to interference sources and their power. The DAS method proved to be the least effective (5.35–5.95 dB) and demonstrated sensitivity to changes in distance. It was established that the key factor in signal degradation is the configuration of noise sources, among which the cross topology is the most difficult to filter.Downloads
References
1. O. Barkovska, A. Havrashenko and P. Botnar, "The influence of reverberation, equalization and compression methods on speaker recognition," 2025 IEEE 6th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2025, pp. 1-5, doi: https://doi.org/10.1109/KhPIWeek61436.2025.11288718
2. Kulkarni, S., Thakur, A., Soni, S., Hiwale, A., Belsare, M. H., & Raj, A. B. (2025). A comprehensive review of direction of arrival (DoA) estimation techniques and algorithms. Journal of Electronics and Electrical Engineering, 138-186. https://doi.org/10.37256/jeee.4120255708
3. H. A. Kassir, Z. D. Zaharis, P. I. Lazaridis, N. V. Kantartzis, T. V. Yioultsis and T. D. Xenos, "A Review of the State of the Art and Future Challenges of Deep Learning-Based Beamforming," in IEEE Access, vol. 10, pp. 80869-80882, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3195299
4. Barkovska Olesia, Vitalii Serdechnyi. Intelligent Assistance System for People with Visual Impairments. Innovative technologies and scientific solutions for industries, no. 2(28), June 2024, pp. 6–16. https://doi.org/10.30837/2522-9818.2024.28.006
5. Barkovska, O., Holovchenko, O., Storchai, D., Kostin, A., & Lehezin, N. (2025). Investigation of computer vision techniques for indoor navigation systems. Innovative technologies and scientific solutions for industries, (2(32), 5–15. https://doi.org/10.30837/2522-9818.2025.2.005
6. Xi, J., Xu, Z., Zhang, W., Xie, Y., & Zhao, L. (2025). Speech Enhancement Algorithm Based on Microphone Array and MultiChannel Parallel GRU-CNN Network. Electronics, 14(4), 681. https://doi.org/10.3390/electronics14040681
7. Wang, J.-H., Le, P. T., Bee, W.-S., Putri, W. R., Su, M.-H., Li, K.-C., Chen, S.-L., He, J.-L., Pham, T., Li, Y.-H., & Wang, J.- C. (2024). Implementation of Sound Direction Detection and Mixed Source Separation in Embedded Systems. Sensors, 24(13),4351. https://doi.org/10.3390/s24134351
8. Wang, J.-H., Le, P. T., Kuo, S.-J., Tai, T.-C., Li, K.-C., Chen, S.-L., Wang, Z.-Y., Pham, T., Li, Y.-H., & Wang, J.-C. (2024). Audio Pre-Processing and Beamforming Implementation on Embedded Systems. Electronics, 13(14), 2784. https://doi.org/10.3390/electronics13142784
9. Huang, P., Ullah, I., Wei, X., Ahamed, A. T., Hassan, N., & Shah, Z. H. (2025). Towards Energy-Efficient and Low-Latency Voice-Controlled Smart Homes: A Proposal for Offline Speech Recognition and IoT Integration. ArXiv.org. https://arxiv.org/abs/2506.07494
10. Ciccarelli, G., Barber, J., Nair, A., Cohen, I., & Zhang, T. (2022). Challenges and Opportunities in Multi-device Speech Processing. ArXiv.org. https://arxiv.org/abs/2206.15432
11. Haeb-Umbach, R., Heymann, J., Drude, L., Watanabe, S., Delcroix, M., & Nakatani, T. (2020). Far-Field Automatic Speech Recognition. ArXiv.org. https://arxiv.org/abs/2009.09395
12. Rascon, C. (2021). A Corpus-Based Evaluation of Beamforming Techniques and Phase-Based Frequency Masking. Sensors, 21(15), 5005. https://doi.org/10.3390/s21155005
13. Rowe, H. P., Gutz, S. E., Maffei, M. F., Tomanek, K., & Green, J. R. (2022). Characterizing Dysarthria Diversity for Automatic Speech Recognition: A Tutorial From the Clinical Perspective. Frontiers in Computer Science, 4. https://doi.org/10.3389/fcomp.2022.770210
14. Luria, M., Hoffman, G., & Zuckerman, O. (2017). Comparing Social Robot, Screen and Voice Interfaces for Smart-Home Control. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3025453.3025786
15. B. D. Van Veen and K. M. Buckley, "Beamforming: a versatile approach to spatial filtering," in IEEE ASSP Magazine, vol. 5, no. 2, pp. 4-24, April 1988, doi: https://doi.org/10.1109/53.665
16. C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327, August 1976, doi: https://doi.org/10.1109/TASSP.1976.1162830
17. Rakerd, B., Hartmann, W.M. (2005). Localization of noise in a reverberant environment. In: Pressnitzer, D., de Cheveigné, A., McAdams, S., Collet, L. (eds) Auditory Signal Processing. Springer, NY. https://doi.org/10.1007/0-387-27045-0_51
18. J. Capon, "High-resolution frequency-wavenumber spectrum analysis," in Proceedings of the IEEE, vol. 57, no. 8, pp. 1408- 1418, Aug. 1969, doi: https://doi.org/10.1109/PROC.1969.7278
19. R. Schmidt, "Multiple emitter location and signal parameter estimation," in IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, March 1986, doi: https://doi.org/10.1109/TAP.1986.1143830
20. R. Roy and T. Kailath, "ESPRIT-estimation of signal parameters via rotational invariance techniques," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp. 984-995, July 1989, doi: https://doi.org/10.1109/29.32276
21. Rudzicz, F., Namasivayam, A.K. & Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation 46, 523–541 (2012). https://doi.org/10.1007/s10579-011-9145-0
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Daniil Raptanov, Olesia Barkovska, Mykhailo Shylenko, Oleksandr Holovchenko, Diana Ivakhnenko

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.