THE EXPLAINABILITY OF SHALLOW AI-GENERATED TEXT CLASSIFICATION MODELS VIA PARTS REMOVING
DOI:
https://doi.org/10.26906/SUNZ.2026.2.153Ключові слова:
explainability, black-box, shallow ANN, perturbation, AI-generated content, human-written content, text chunk, text classification, explainability indexАнотація
In this paper, we address the explainability problem for the ANNs' classification of AI-generated and humanwritten text chunks in Ukrainian texts in the IT domain. The objective is to investigate whether the perturbation-based modifications of text chunks that include the removal of sentences, words, and word combinations may be helpful in searching for explanations. We used five shallow ANN models (with an average accuracy of about 0.88) and tested them on a sample of the document containing human-written text and AI-generated fragments generated with GPT-5, Gemini 2.5 Flash, and Claude Sonnet 4.5. The experimental modeling showed that it is not easy to find a single sentence or word that can flip the classification result. We have proposed an explainability index that measures the total influence of all perturbed samples on the classification result, accounting for the fact that short perturbations are more valuable.Завантажити
Посилання
1. P. Fantozzi and M. Naldi, “The Explainability of Transformers: Current Status and Directions,” Computers, vol. 13, no. 4, p. 92, 2024. doi: https://doi.org/10.3390/computers13040092
2. A. Ali, T. Schnake, O. Eberle, G. Montavon, K. R. Müller, and L. Wolf, “XAI for Transformers: Better Explanations through Conservative Propagation,” Proc. Machine Learning Research (PMLR), vol. 162, 2022, pp. 436–451. [Online]. Available: https://proceedings.mlr.press/v162/ali22a/ali22a.pdf
3. A. Dugăeșescu and A. M. Florea, “Evaluation and analysis of visual methods for CNN explainability: a novel approach and experimental study,” Neural Computing and Applications, vol. 37, no. 20, p. 14935-14970, 2025. doi: https://doi.org/10.1007/s00521-025-11282-7
4. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 2921-2929, 2015. doi: .1109/ https://doi.org/10CVPR.2016.319
5. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 618-626, doi: https://doi.org/10.1109/ICCV.2017.74
6. M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic Attribution for Deep Networks,” 2017 International Conference on Machine Learning, vol. 70, p. 3319 – 3328. doi: https://doi.org/10.5555/3305890.3306024
7. A. Shrikumar, P. Greenside, and A. Kundaje, “Learning Important Features Through Propagating Activation Differences,” 2017 International Conference on Machine Learning, vol. 70, p. 3145–3153. doi: https://doi.org/10.48550/arXiv.1704.02685
8. M. T. Ribeiro, S. Singh, and C. Guestrin, “'Why Should I Trust You?': Explaining the Predictions of Any Classifier,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16), p. 1135–1144. doi: https://doi.org/10.1145/2939672.2939778
9. S. M. Lundberg and S. I. Lee, “A Unified Approach to Interpreting Model Predictions”, Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 4765–4774, doi: https://doi.org/10.48550/arXiv.1705.07874
10. D. Mardaoui and D. Garreau, “An Analysis of LIME for Text Data,” International Conference on Artificial Intelligence and Statistics AISTATS 2021, vol. 130, doi: https://doi.org/10.48550/arXiv.2010.12487
11. A. Aghababaei, J. Nikadon, M. Formanowicz, M. Bettinsoli, C. Cervone, C. Suitner and T. Erseghe, “Application of integrated gradients explainability to sociopsychological semantic markers,” Available at: https://arxiv.org/pdf/2503.04989
12. E. Mendez Guzman, V. Schlegel, and R. Batista-Navarro, “From outputs to insights: A survey of rationalization approaches for explainable text classification,” Frontiers in Artificial Intelligence, vol. 7, 2024. doi: https://doi.org/10.3389/frai.2024.1363531
13. M. Saarela and V. Podgorelec, “Recent Applications of Explainable AI (XAI): A Systematic Literature Review,” Applied Sciences, vol. 14, no. 19, p. 8884, 2024. doi: https://doi.org/10.3390/app14198884
14. B. Wei and Z. Zhu, “ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text Classification,” Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p. 4503–4523. doi: https://doi.org/10.18653/v1/2025.acl-long.226
15. H. Yan, L. Gui, and Y. He, “Hierarchical Interpretation of Neural Text Classification,” Computational Linguistics, vol. 48, no. 4, p. 987–1020, 2022. doi: https://doi.org/10.1162/coli_a_00459
16. H. Moraliyage, G. Kulawardana, D. De Silva, Z. Issadeen, M. Manic and S. Katsura, “Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers,” Applied System Innovation, vol. 8, no. 1, p. 17, 2025. doi: https://doi.org/10.3390/asi8010017
17. O. Peredrii, “Shallow ANN models to classify Ukrainian AI-generated text,” Control, Navigation and Communication Systems, no. 4(82), 2025, pp. 108–113. doi: https://doi.org/10.26906/SUNZ.2025.4.108-113
18. O. Gorokhovatskyi, O. Peredrii, and O. Teslenko, “Multiple recursive division explanations for image classification problems,” Advanced Information Systems, vol. 9, no. 3, 2025, pp. 5–13. doi: https://doi.org/10.20998/2522-9052.2025.3.01
19. O. Gorokhovatskyi and O. Peredrii, “Recursive Division Explainability as a Factor of CNN Quality,” Lecture Notes in Data Engineering, Computational Intelligence, and Decision Making, vol. 219, 2024, pp. 308–325. doi: https://doi.org/10.1007/978-3-031-70959-3_16
Завантаження
Опубліковано
Номер
Розділ
Ліцензія
Авторське право (c) 2026 Olena Peredrii, Oleksii Gorokhovatskyi

Ця робота ліцензується відповідно до ліцензії Creative Commons Attribution-NonCommercial 4.0 International License.