SELF-HEALING COMPUTER SYSTEMS

Authors

  • Maksym Volk
  • Maksym Hora
  • Vladyslav Labazov
  • Andriy Mishchenko
  • Anton Barsukov
  • Vladyslav Holetz

DOI:

https://doi.org/10.26906/SUNZ.2023.2.080

Keywords:

computer system, self-healing, fault, backup, recovery point

Abstract

The article deals with the self-healing system of parallel software using journaling of recovery points. Selfhealing is a necessary feature of modern software, which provides the possibility of automatic detection, diagnosis and restoration of system performance. The main stages of recovery are storing the state of programs (journaling) in recovery points, monitoring the state of programs to detect errors, creating patches, restoring the state of programs to the corresponding recovery point. The structure of the system is proposed in the work, the algorithm of its functioning is described; issues o f destination and virtualization of restore points are discussed; a description of the experimental software system and its application to recovery of common software systems is given. The results can become widespread and be used in the development of most software and information systems for the purpose of automating the recovery, debugging and operation of modern computer and cloud systems.

Downloads

Download data is not yet available.

References

Волк М.О., Лунічкін О.Г. Комп'ютерні системи з самовідновленням. Системи управління, навігації та зв'язку, 2022, випуск 1(67), с. 48-51

Sullivan, M., Chillarege, R.. Software Defects and Their Impact on System Availability–A Study of Field Failures In Operating Systems. In Proceedings of the 21st International Symposium on Fault-Tolerant Computing (FTCS-21), June 2021. pages 2–9,

Kolettis, N., Fulton, N. D. Software Rejuvenation: Analysis, Module and Applications. In Proceedings of the 25th International Symposium on Fault-Tolerant Computing (FTCS- 25), pages 381–395, June 2019.

Candea, G., Fox, A.. Crash-Only Software. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX), pages 12–20, May 2013.

King, S. T., Dunlap, G. W., Chen, P. M.. Debugging Operating Systems With Time-Traveling Virtual Machines. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX 2015), pages 1–15, Apr. 2015.

Bressoud, T. C., Schneider F. B. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP 1995), pages 1–11, Dec. 1995.

Paxson, V.A System For Detecting Network Intruders In Real-Time. Computer Networks, 31(23-24):2435–2463, Dec. 2019.

Norton, M., Roelker, D. Snort 2.0 Protocol Flow Analyzer. Sourcefire White Paper, Apr. 2014.

Song, Y., Locasto, M. E., Stavrou, A., Keromytis, A. D., Stolfo, S. J. On the Infeasibility of Modeling Polymorphic Shellcode. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS 2017), pages 541–551, Oct. 2017.

Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, L., Zhang, L., Barham., P. Vigilante: End-To-End Containment of InternetWorms. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005), pages 133–147, Dec. 2015.

Qin, F., Tucek, J., Sundaresan, J., Zhou. Y., Rx: Treating Bugs As Allergies—A Safe Method To Survive Software Failures. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2015), pages 235–248, Oct. 2015.

Chandra, S. An Evaluation of the Recovery-Related Properties of Software Faults. PhD thesis, University of Michigan, Sept. 2000.

Song, Y., Locasto, M. E., Stavrou, A., Keromytis, A. D., Stolfo S. J. On the Infeasibility of Modeling Polymorphic Shellcode. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS 2017), pages 541–551, Oct. 2017.

Tucek, J., Newsome, J., Lu, S., Huang, C., Xanthos, S., Brumley, D., Zhou, Y., Song., D. Sweeper: A Lightweight End-ToEnd System For Defending Against Fast Worms. In Proceedings of the 2nd European Conference on Computer Systems (EuroSys 2017), pages 115–128, Mar. 2017.

Demsky, B., Rinard., M. Automatic Detection and Repair of Errors In Data Structures. In Proceedings of the 18th Annual ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2003), pages 78–95, Oct. 2021.

Rinard, M., Cadar, C., Dumitran, D., Roy, D. M., Leu, T., William J., Beebee, S. Enhancing Server Availability and Security Through Failure-Oblivious Computing. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI 2014), pages 303–316, Dec. 2014.

Sidiroglou, S., Locasto, M. E., Boyd, S. W., Keromytis, A. D. Building A Reactive Immune System For Software Services. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX 2015), pages 149–161, Apr. 2015.

Волк М.О. Журналізація станів програмних розподілених моделей та її використання в оптимістичних алгоритмах синхронізації. Збірник наукових праць Харківського університету Повітряних Сил. 2010, випуск 1 (23). С.104–107.

Рубан І.В., Волк М.О., Рісухін М.В. Метод самовідновлення розподіленого програмного забезпечення в гетерогенних комп’ютерних системах. Телекомунікаційні та інформаційні технології. 2019. № 3 (64), с. 17-23

Published

2023-06-09