THE IMPORTANCE OF MASSIVELY PARALLEL COMPUTING SYSTEMS IN SCANNED DOCUMENTS PROCESSING

Authors

  • O. Barkovska
  • V. Kholiev
  • D. Polikanov

DOI:

https://doi.org/10.26906/SUNZ.2022.1.043

Keywords:

system, processing, text, processor, GPU, multithreading, image, preprocessing, speedup, information resource, storage

Abstract

The paper proposes a generalized classification model for scanned documents, which represents an organizational-functional, technological and software-hardware complex for document classification or categorization by keywords defined in a frequency dictionary. The relevance of the research topic lies in time reducing for new scanned information resources streamline, due to the increase in the speed of the methods for improving the quality of the original image immediately before processing and analyzing the text on images. The analysis of the results showed the effectiveness and expediency of using massively parallel computers to perform tasks such as noise reduction and changing the value of color channels of the original fullcolor image, achieving an acceleration of up to 53,51% compared to using the computing resources of the central processor.

Downloads

References

Rusyn B., Lytvyn V., Vysotska V., Emmerich M., Pohreliuk L. (2019) The Virtual Library System Design and Development.In: Shakhovska N., Medykovskyy M. (eds) Advances in Intelligent Systems and Computing III. CSIT 2018. Advances in Intelligent Systems and Computing, vol 871. Springer, Cham. https://doi.org/10.1007/978-3-030-01069-0_24.

Cao, G., Liang, M., & Li, X. (2018). How to make the library smart? The conceptualization of the smart library. Electron. Libr., 36, 811-825.

T. Hermawan and R. W. Wardhani, "Implementation AES with digital signature for secure web-based electronic archive,"2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), 2016, pp. 1-6, doi: 10.1109/ICITEED.2016.7863268.

Y. Wang, "Design and Implementation of Electronic Archives Information Management Under Cloud Computing Platform,"2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), 2019, pp. 154-158, doi: 10.1109/ICMTMA.2019.00041.

Y. Yang and J. Shieh, "Data Warehouse Applications in Libraries -- The Development of Library Management Reports,"2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), 2016, pp. 88-91, doi: 10.1109/IIAIAAI. 2016.129.

S. Savanur and K. S. Shreedhara, "Automated data validation for data warehouse testing," 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), 2016, pp. 223-226, doi: 10.1109/ICEECCOT.2016.7955219.

Barkovska, O., Pyvovarova, D., Kholiev, V., Ivashchenko, H., & Rosinskyi, D. (2021). Information object storage model with accelerated text processing methods. In CEUR Workshop Proceedings (pp. 286-299).

Barkovska, O., Kholiev, V., Pyvovarova, D., Ivaschenko, G., & Rosinskiy, D. (2021). INTERNATIONAL SYSTEM OF KNOWLEDGE EXCHANGE FOR YOUNG SCIENTISTS. Advanced Information Systems-Sučasnì ìnformacìjnì sistemi, 5(1), 69-74.

M. Lan, C. L. Tan, J. Su and Y. Lu, "Supervised and Traditional Term Weighting Methods for Automatic Text Categorization," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 721-735, April 2009, doi: 10.1109/TPAMI.2008.110.

Sang-Bum Kim, Kyoung-Soo Han, Hae-Chang Rim and Sung Hyon Myaeng, "Some Effective Techniques for Naive Bayes Text Classification," in IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1457-1466, Nov. 2006, doi: 10.1109/TKDE.2006.180.

Yefeng Zheng, Huiping Li and D. Doermann, "Machine printed text and handwriting identification in noisy document images," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 337-353, March 2004, doi: 10.1109/TPAMI.2004.1262324.

R. Smith, "An Overview of the Tesseract OCR Engine," Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007, pp. 629-633, doi: 10.1109/ICDAR.2007.4376991.

Y. -M. Su, H. -W. Peng, K. -W. Huang and C. -S. Yang, "Image processing technology for text recognition," 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2019, pp. 1-5, doi: 10.1109/TAAI48200.2019.8959877.

Olesia Barkovska, Oleg Mikhal , Daria Pyvovarova , Oleksii Liashenko , Vladyslav Diachenko and Maxim Volk, Local Concurrency in Text Block Search Tasks, International Journal of Emerging Trends in Engineering Research. - Volume 8. No. 3, March 2020. – P.690-694.

Barkovska О., Pyvovarova D. and Serdechnyi V., Pryskorenyj alghorytm poshuku sliv-obraziv u teksti z adaptyvnoju dekompozycijeju vykhidnykh danykh. [Accelerated word-image search algorithm in text with adaptive decomposition of input data]. Systemy upravlinnja, navighaciji ta zv'jazku 4 (56), 28-34. (in Ukrainian)

Published

2022-04-01