THE EXPLAINABILITY OF SHALLOW AI-GENERATED TEXT CLASSIFICATION MODELS VIA PARTS REMOVING

Olena Peredrii; Oleksii Gorokhovatskyi

doi:10.26906/SUNZ.2026.2.153

Автор(и)

Olena Peredrii
Oleksii Gorokhovatskyi

DOI:

https://doi.org/10.26906/SUNZ.2026.2.153

Ключові слова:

explainability, black-box, shallow ANN, perturbation, AI-generated content, human-written content, text chunk, text classification, explainability index

Анотація

In this paper, we address the explainability problem for the ANNs' classification of AI-generated and humanwritten text chunks in Ukrainian texts in the IT domain. The objective is to investigate whether the perturbation-based modifications of text chunks that include the removal of sentences, words, and word combinations may be helpful in searching for explanations. We used five shallow ANN models (with an average accuracy of about 0.88) and tested them on a sample of the document containing human-written text and AI-generated fragments generated with GPT-5, Gemini 2.5 Flash, and Claude Sonnet 4.5. The experimental modeling showed that it is not easy to find a single sentence or word that can flip the classification result. We have proposed an explainability index that measures the total influence of all perturbed samples on the classification result, accounting for the fact that short perturbations are more valuable.

Завантажити

Дані для завантаження поки недоступні.

Посилання

1. P. Fantozzi and M. Naldi, “The Explainability of Transformers: Current Status and Directions,” Computers, vol. 13, no. 4, p. 92, 2024. doi: https://doi.org/10.3390/computers13040092

2. A. Ali, T. Schnake, O. Eberle, G. Montavon, K. R. Müller, and L. Wolf, “XAI for Transformers: Better Explanations through Conservative Propagation,” Proc. Machine Learning Research (PMLR), vol. 162, 2022, pp. 436–451. [Online]. Available: https://proceedings.mlr.press/v162/ali22a/ali22a.pdf

3. A. Dugăeșescu and A. M. Florea, “Evaluation and analysis of visual methods for CNN explainability: a novel approach and experimental study,” Neural Computing and Applications, vol. 37, no. 20, p. 14935-14970, 2025. doi: https://doi.org/10.1007/s00521-025-11282-7

4. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning Deep Features for Discriminative Localization,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), p. 2921-2929, 2015. doi: .1109/ https://doi.org/10CVPR.2016.319

5. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 618-626, doi: https://doi.org/10.1109/ICCV.2017.74

6. M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic Attribution for Deep Networks,” 2017 International Conference on Machine Learning, vol. 70, p. 3319 – 3328. doi: https://doi.org/10.5555/3305890.3306024

7. A. Shrikumar, P. Greenside, and A. Kundaje, “Learning Important Features Through Propagating Activation Differences,” 2017 International Conference on Machine Learning, vol. 70, p. 3145–3153. doi: https://doi.org/10.48550/arXiv.1704.02685

8. M. T. Ribeiro, S. Singh, and C. Guestrin, “'Why Should I Trust You?': Explaining the Predictions of Any Classifier,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16), p. 1135–1144. doi: https://doi.org/10.1145/2939672.2939778

9. S. M. Lundberg and S. I. Lee, “A Unified Approach to Interpreting Model Predictions”, Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 4765–4774, doi: https://doi.org/10.48550/arXiv.1705.07874

10. D. Mardaoui and D. Garreau, “An Analysis of LIME for Text Data,” International Conference on Artificial Intelligence and Statistics AISTATS 2021, vol. 130, doi: https://doi.org/10.48550/arXiv.2010.12487

11. A. Aghababaei, J. Nikadon, M. Formanowicz, M. Bettinsoli, C. Cervone, C. Suitner and T. Erseghe, “Application of integrated gradients explainability to sociopsychological semantic markers,” Available at: https://arxiv.org/pdf/2503.04989

12. E. Mendez Guzman, V. Schlegel, and R. Batista-Navarro, “From outputs to insights: A survey of rationalization approaches for explainable text classification,” Frontiers in Artificial Intelligence, vol. 7, 2024. doi: https://doi.org/10.3389/frai.2024.1363531

13. M. Saarela and V. Podgorelec, “Recent Applications of Explainable AI (XAI): A Systematic Literature Review,” Applied Sciences, vol. 14, no. 19, p. 8884, 2024. doi: https://doi.org/10.3390/app14198884

14. B. Wei and Z. Zhu, “ProtoLens: Advancing Prototype Learning for Fine-Grained Interpretability in Text Classification,” Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p. 4503–4523. doi: https://doi.org/10.18653/v1/2025.acl-long.226

15. H. Yan, L. Gui, and Y. He, “Hierarchical Interpretation of Neural Text Classification,” Computational Linguistics, vol. 48, no. 4, p. 987–1020, 2022. doi: https://doi.org/10.1162/coli_a_00459

16. H. Moraliyage, G. Kulawardana, D. De Silva, Z. Issadeen, M. Manic and S. Katsura, “Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers,” Applied System Innovation, vol. 8, no. 1, p. 17, 2025. doi: https://doi.org/10.3390/asi8010017

17. O. Peredrii, “Shallow ANN models to classify Ukrainian AI-generated text,” Control, Navigation and Communication Systems, no. 4(82), 2025, pp. 108–113. doi: https://doi.org/10.26906/SUNZ.2025.4.108-113

18. O. Gorokhovatskyi, O. Peredrii, and O. Teslenko, “Multiple recursive division explanations for image classification problems,” Advanced Information Systems, vol. 9, no. 3, 2025, pp. 5–13. doi: https://doi.org/10.20998/2522-9052.2025.3.01

19. O. Gorokhovatskyi and O. Peredrii, “Recursive Division Explainability as a Factor of CNN Quality,” Lecture Notes in Data Engineering, Computational Intelligence, and Decision Making, vol. 219, 2024, pp. 308–325. doi: https://doi.org/10.1007/978-3-031-70959-3_16