METHODS OF PREVENTING AUTOMATED COLLECTION OF INFORMATION FROM WEB PAGES
DOI:
https://doi.org/10.26906/SUNZ.2024.2.163Keywords:
parsing software, anti-bot methods, parsers, crawlersAbstract
The work is devoted to the analysis of methods of counteracting the automated scanning of web sites. The purpose of the work is to analyze the algorithms and features of parsing systems and, based on the data obtained, to build a tool system that will specialize in detecting and countering attacks using parsing systems. The research method is the analysis of methods of countering parsing systems. The paper considered the history of the emergence of automated systems, their classification, features of work and methods of countermeasures. The proposed methods specialize in protection against parsing systems, and create a minimal additional load on server equipment, which does not interfere with the work of ordinary users. These methods will be useful to the owners of large resources, on which information is the main thing.Downloads
References
Pevnev V., Frolov A., Tsuranov M., and Zemlyanko H. Ensuring data integrity in infocommunication systems. International Journal of Computer Science, 21(2), 2022. pp.228–233. doi.org/10.47839/ijc.21.2.2591;
Статистика ботів. Begibot. URL: https://www.begindot.com/ua/
Imperva Bad Bot Report. Imperva. URL – https://www.imperva.com/resources/reports/2023-Imperva-Bad-Bot-Report.pdf
A. Serkov, V. Tkachenko, V. Kharchenko, V. Pevnev, K. Trubchaninova, N. Doukas, “Method of increasing security of spatial intelligence in the industrial internet of things systems,” Proceedings of the 24th Int. Conference on Circuits, Systems, Communications and Computers, CSCC’2020, 2020, pp. 283–289. https://doi.org/10.1109/CSCC49995.2020.00058;
Serkov, A., Tkachenko, V., Kharchenko, V., Pevnev, V. Method to Enhance the Bandwidth and Noise Immunity of IIoT When Exposed to Natural and Intentional Electromagnetic Interference. 2020 IEEE Int. Conf. on Problems of Inf. Science and Technology (PIC S&T). Kharkiv, 2020. p.527–532. doi: https:// doi.org/10.1109/picst51311.2020.94679295
Instagram тестує нові попередження. Unian. URL: https://www.unian.net/techno/iskusstvennyy-intellekt-ne-proydet-instagram-testiruet-novye-preduprezhdeniya-12348897.html
Чат-бот. Sendpulse. URL: https://sendpulse.ua/ua/support/glossary/chatbot
Що таке веб-краулер? Brightdata. URL: https://ua-brightdata.com/blog/web-data-ru/what-is-a-web-crawler
ALGOL. Ain. URL: https://ain.ua/ua/2021/09/24/5-mertvix-jazikov-programmirovanija/
Пошуковий індекс. Roistat. URL: http://surl.li/qrnqh.
Чат-бот «Еліза» з 1960-х років пройшов тест Т'юрінга краще, ніж ChatGPT. Technovery. URL: https://technovery.com/chat-bot-eliza-iz-1960-h-godov-proshel-test-tyuringa-luchshe-chem-chatgpt/
Beagle. DBpedia. URL – https://dbpedia.org/page/Bagle_(computer_worm)
Akbot. DataProtection. URL: https://vms.dataprotection.com.ua/virus/?i=95482
Виявлено ботнет Win32/Georbot, який використовує для оновлення сайт уряду Грузії. ESET. URL: https://www.eset.com/ua-ru/about/newsroom/press-releases/malware/obnaruzhen-win32-georbot-napadenie-ru/
Izz ad-Din al-Qassam Cyber Fighters. Radware. URL – https://www.radware.com/security/ddos-knowledge-center/ddospedia/izz-ad-din-al-qassam-cyber-fighters/
Ботнет Mirai. Enigmasoftware. URL: https://www.enigmasoftware.com/ua/mirai-botnet-udaleniye/
8 найбільших DDoS-атак в історії. Timeweb. URL: http://surl.li/qrnql.
Securing Broncos Country. Checkpoint. URL – https://www.checkpoint.com/security-in-action/
Cybersecurity news from Hong Kong. Portswigger. URL – https://portswigger.net/daily-swig/hong-kong
Malicious attacks on the web and crawling of information data by Python technology. URL https://www.researchgate.net/ publication/351772882_Malicious_attacks_on_the_web_and_crawling_of_information_data_by_Python_technology
Хакери знову напали на американські банки. Finance.Bigmir. URL: https://finance.bigmir.net/news/2824135
Protection from even the most severe DDoS attacks. Stormwall. URL – https://stormwall.network/
Актуальні кіберзагрози: IV квартал 2023 року. Fortiguard. URL: https://www.ptsecurity.com/ru-ru/research/analytics/cybersecurity-threatscape-2022-q4/
Anti-Botnet Services. Fortiguard. URL – https://www.fortiguard.com/services/botnet
EnemyBot. Enigmasoftware. https://www.enigmasoftware.com/ua/enemybot-udaleniye/
Що таке ботнет? ESET. URL: https://www.eset.com/ua-ru/support/information/entsiklopediya-ugroz/zashchita-ot-botnetov/
Розвиток ботнетів і DDoS-атак. IITD. URL: https://iitd.com.ua/ua/news/rozvitok-botnetiv-i-ddos-atak/
Crypto Mining Bot. Netacea. URL – https://netacea.com/glossary/crypto-mining-bot/
Pevnev V., Tsuranov M., Zemlianko H., Amelina O. Conceptual Model of Information Security, Integrated Computer Technologies in Mechanical Engineering, 2020, Vol. No 188, pp. 158–168. DOI: 10.1007/978-3-030-66717-7_14;
Загальні відомості про наші пошукові роботи та інструменти для збору даних. Google Developers. URL: https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers?hl
Uniform Resource Identifier. RFC. URL – https://www.rfc-editor.org/rfc/rfc3986
Що таке robots.txt і навіщо взагалі потрібний індексний файл. Netpeak. URL: https://netpeak.net/ua/blog/
Як перевірити IP-адреси сервера та домену в спам-базахю. Unisender. URL: http://surl.li/qrnrh.
Перевірка IP-адрес у спам-листах. Ukraine.com. URL: https://www.ukraine.com.ua/info/tools/rbl
Email Blacklist Перевірка. BRBL. URL: https://ipcalc.co/rbl/
What is a DNSBL? DNSBL. URL – https://www.dnsbl.info/
Spamhaus. Spamhaus Project. URL – https://www.spamhaus.org
Захист від ботів з PT Application Firewall. Slideshare. URL: https://www.slideshare.net/VsevolodPetrov/pt-application-firewall
WAF. ITglobal. URL: https://itglobal.com/ru-ru/company/glossary/waf/
Web Application Firewall. Omnilink. URL: https://omnilink.ua/web-application-firewall/
Що таке CAPTCHA? Google. URL: https://support.google.com/a/answer/1217728?hl=ru
BestCaptchaSolver. Bestcaptchasolver. URL: https://bestcaptchasolver.com/
Regular Expressions in XQuery: A Rephrased Perspective. CopyProgramming. URL – https://copyprogramming.com/howto/xquery-regular-expressions
Що таке XPath? Функції та синтаксис. HighLoad. URL: https://highload.today/xpath-xml