A STUDY OF THE ACCURACY OF BIMFORMING METHODS IN THE CONTEXT OF AN INCLUSIVE INTERNAL NAVIGATION SYSTEM

Daniil Raptanov; Olesia Barkovska; Mykhailo Shylenko; Oleksandr Holovchenko; Diana Ivakhnenko

doi:10.26906/SUNZ.2026.2.165

Authors

Daniil Raptanov
Olesia Barkovska
Mykhailo Shylenko
Oleksandr Holovchenko
Diana Ivakhnenko

DOI:

https://doi.org/10.26906/SUNZ.2026.2.165

Keywords:

inclusive navigation system, visual impairment, speech recognition, spatial filtering, beamforming, Delayand-Sum, Max-UDR, Max-SINR, dynamic noise

Abstract

Relevance. Voice control of elements in inclusive navigation systems is critical for ensuring the independence and safe mobility of people with visual impairments in public spaces, particularly in large retail spaces. However, existing speech-to-text (STT) systems face a significant drop in recognition accuracy due to the highly dynamic and non-stationary acoustic noise in supermarkets. The object of this study is audio stream preprocessing and spatial filtering (beamforming) in a voice control system under conditions of dynamic, non-stationary noise. The problem lies in the insufficient selectivity of standard audio signal processing algorithms under conditions of background noise in a store, which leads to a critical increase in the word error rate (WER) and makes the smart cart control system vulnerable. The objective of the article is to evaluate the impact of external factors (number, spatial topology of placement, and power level of acoustic noise sources) on the accuracy of spatial filtering (beamforming) methods for subsequent voice command recognition through computer simulation. As a result of the study, the acoustic environment and microphone array were simulated using the Pyroomacoustics library. A comparison was conducted between three methods: Delay-and-Sum (DAS), Max-UDR, and Max-SINR. The study showed that the Max-SINR algorithm provides the highest signal-to-noise ratio gain (Delta SNR from 7.9 to 9.1 dB) and is mathematically robust to changes in the distance to interference sources and their power. The DAS method proved to be the least effective (5.35–5.95 dB) and demonstrated sensitivity to changes in distance. It was established that the key factor in signal degradation is the configuration of noise sources, among which the cross topology is the most difficult to filter.

Downloads

Download data is not yet available.

References

1. O. Barkovska, A. Havrashenko and P. Botnar, "The influence of reverberation, equalization and compression methods on speaker recognition," 2025 IEEE 6th KhPI Week on Advanced Technology (KhPIWeek), Kharkiv, Ukraine, 2025, pp. 1-5, doi: https://doi.org/10.1109/KhPIWeek61436.2025.11288718

2. Kulkarni, S., Thakur, A., Soni, S., Hiwale, A., Belsare, M. H., & Raj, A. B. (2025). A comprehensive review of direction of arrival (DoA) estimation techniques and algorithms. Journal of Electronics and Electrical Engineering, 138-186. https://doi.org/10.37256/jeee.4120255708

3. H. A. Kassir, Z. D. Zaharis, P. I. Lazaridis, N. V. Kantartzis, T. V. Yioultsis and T. D. Xenos, "A Review of the State of the Art and Future Challenges of Deep Learning-Based Beamforming," in IEEE Access, vol. 10, pp. 80869-80882, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3195299

4. Barkovska Olesia, Vitalii Serdechnyi. Intelligent Assistance System for People with Visual Impairments. Innovative technologies and scientific solutions for industries, no. 2(28), June 2024, pp. 6–16. https://doi.org/10.30837/2522-9818.2024.28.006

5. Barkovska, O., Holovchenko, O., Storchai, D., Kostin, A., & Lehezin, N. (2025). Investigation of computer vision techniques for indoor navigation systems. Innovative technologies and scientific solutions for industries, (2(32), 5–15. https://doi.org/10.30837/2522-9818.2025.2.005

6. Xi, J., Xu, Z., Zhang, W., Xie, Y., & Zhao, L. (2025). Speech Enhancement Algorithm Based on Microphone Array and MultiChannel Parallel GRU-CNN Network. Electronics, 14(4), 681. https://doi.org/10.3390/electronics14040681

7. Wang, J.-H., Le, P. T., Bee, W.-S., Putri, W. R., Su, M.-H., Li, K.-C., Chen, S.-L., He, J.-L., Pham, T., Li, Y.-H., & Wang, J.- C. (2024). Implementation of Sound Direction Detection and Mixed Source Separation in Embedded Systems. Sensors, 24(13),4351. https://doi.org/10.3390/s24134351

8. Wang, J.-H., Le, P. T., Kuo, S.-J., Tai, T.-C., Li, K.-C., Chen, S.-L., Wang, Z.-Y., Pham, T., Li, Y.-H., & Wang, J.-C. (2024). Audio Pre-Processing and Beamforming Implementation on Embedded Systems. Electronics, 13(14), 2784. https://doi.org/10.3390/electronics13142784

9. Huang, P., Ullah, I., Wei, X., Ahamed, A. T., Hassan, N., & Shah, Z. H. (2025). Towards Energy-Efficient and Low-Latency Voice-Controlled Smart Homes: A Proposal for Offline Speech Recognition and IoT Integration. ArXiv.org. https://arxiv.org/abs/2506.07494

10. Ciccarelli, G., Barber, J., Nair, A., Cohen, I., & Zhang, T. (2022). Challenges and Opportunities in Multi-device Speech Processing. ArXiv.org. https://arxiv.org/abs/2206.15432

11. Haeb-Umbach, R., Heymann, J., Drude, L., Watanabe, S., Delcroix, M., & Nakatani, T. (2020). Far-Field Automatic Speech Recognition. ArXiv.org. https://arxiv.org/abs/2009.09395

12. Rascon, C. (2021). A Corpus-Based Evaluation of Beamforming Techniques and Phase-Based Frequency Masking. Sensors, 21(15), 5005. https://doi.org/10.3390/s21155005

13. Rowe, H. P., Gutz, S. E., Maffei, M. F., Tomanek, K., & Green, J. R. (2022). Characterizing Dysarthria Diversity for Automatic Speech Recognition: A Tutorial From the Clinical Perspective. Frontiers in Computer Science, 4. https://doi.org/10.3389/fcomp.2022.770210

14. Luria, M., Hoffman, G., & Zuckerman, O. (2017). Comparing Social Robot, Screen and Voice Interfaces for Smart-Home Control. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3025453.3025786

15. B. D. Van Veen and K. M. Buckley, "Beamforming: a versatile approach to spatial filtering," in IEEE ASSP Magazine, vol. 5, no. 2, pp. 4-24, April 1988, doi: https://doi.org/10.1109/53.665

16. C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 4, pp. 320-327, August 1976, doi: https://doi.org/10.1109/TASSP.1976.1162830

17. Rakerd, B., Hartmann, W.M. (2005). Localization of noise in a reverberant environment. In: Pressnitzer, D., de Cheveigné, A., McAdams, S., Collet, L. (eds) Auditory Signal Processing. Springer, NY. https://doi.org/10.1007/0-387-27045-0_51

18. J. Capon, "High-resolution frequency-wavenumber spectrum analysis," in Proceedings of the IEEE, vol. 57, no. 8, pp. 1408- 1418, Aug. 1969, doi: https://doi.org/10.1109/PROC.1969.7278

19. R. Schmidt, "Multiple emitter location and signal parameter estimation," in IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, March 1986, doi: https://doi.org/10.1109/TAP.1986.1143830

20. R. Roy and T. Kailath, "ESPRIT-estimation of signal parameters via rotational invariance techniques," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 7, pp. 984-995, July 1989, doi: https://doi.org/10.1109/29.32276

21. Rudzicz, F., Namasivayam, A.K. & Wolff, T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resources & Evaluation 46, 523–541 (2012). https://doi.org/10.1007/s10579-011-9145-0

A STUDY OF THE ACCURACY OF BIMFORMING METHODS IN THE CONTEXT OF AN INCLUSIVE INTERNAL NAVIGATION SYSTEM

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

Language

sidebarlinks

Keywords

Information

Latest publications

Developed By