Modern Approaches to Data Storage on a Distributed Infrastructure

Keywords: big data storage, architecture without division of resources, architecture, node, partitioning, partitions, distributed systems

Abstract

The purpose of this article is to study the problems of optimal big data storage and consider the latest approaches to solving these problems to ensure the required level of performance of computing systems. During the scientific research, methods of analysis and generalization, synthesis and system approach were used to study modern methods of data storage on a distributed infrastructure, which are used in the process of information processing. The paper considers the key aspects that arise with the development of technology and a significant increase in the amount of data used. It is established that today there are two fundamental approaches to ensuring the required level of system performance: vertically (approach with shared-memory architecture) and horizontally (shared-nothing architectures) scalable approaches, each of which has both positive and negative sides. According to the latest developments and research in the field of working with big data, the most suitable for solving problems is an architecture without separation of resources, which is analyzed in detail in the work, namely, what are the main conceptual and architectural solutions of distributed systems and the requirements for this architecture are indicated. With this architecture, each node (the machine or virtual machine running the database software) uses its own CPU, memory and disks separately and any coordination between nodes is carried out at the software level using a conventional network. In this approach to data storage on several nodes, depending on the volume of this data, two most common methods are used: replication (in the case of a small data set and the possibility of placing it on one computer) or partitioning (sharding), which includes dividing data into parties based on a specific range of key values in the data set. The need to achieve an appropriate balance between such principles as consistency, accessibility and resistance to distribution in the design of distributed systems is indicated. The statement about the possibility of having at most two of these properties for any distributed data system is substantiated.

Author Biographies

Vasyl Pryimak, Ivan Franko National University of Lviv

Doctor of Economic Sciences, Head of the Information Systems in Management Department

Yurii Rybak, Ivan Franko National University of Lviv

Master’s Student

Olha Holubnyk, Ivan Franko National University of Lviv

Ph.D. in Economics, Associate Professor of Information Systems in Management Department

References

1. Klymash M., Hordiichuk-Bublivska O., Chaikovskyi I. and Danylchenko T. (2021) Research of algorithms for parallel processing of information in databases, Infocommunication technologies and electronic engineering, vol. 1 (1), pp. 51–62.
2. Kleppmann M. (2017) Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainаble Systems. 590 p.
3. Petrov A. (2019) Database Internals: A Deep Dive into How Distributed Data Systems Work. 1st Edition. 590 p.
4. Gorelik A. (2019) The Enterprise Big Data Lake. 197 р.
5. Brewer E. (July 16-19, 2000) Towards Robust Distributed System. Symposium on Principles of Distributed Computing, Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing. Portland.
Published
2023-12-12
How to Cite
Pryimak Vasyl Modern approaches to data storage on a distributed infrastructure / Vasyl Pryimak, Yurii Rybak, Olha Holubnyk // Science Journal «Economics and Region». – Poltava: PNTU, 2023. – VOL. (4(91). – PP. 236-242. – doi:https://doi.org/10.26906/EiR.2023.4(91).3218.
Section
Mathematical methods, models and information technologies in economics