MODERN DATA ORCHESTRATION TOOLS FOR BUILDING BIG DATA PROCESSING PIPELINES
DOI:
https://doi.org/10.26906/SUNZ.2024.2.095Keywords:
data orchestration, data processing pipeline, ETL, DAGAbstract
The data universe in modern companies is constantly expanding. As the amount of data grows, so does the need for management, schedule synchronization, and processing challenges. Companies need to break down the barriers between data sources and storage in order to truly utilize all of the information they collect. Data orchestration allows organizations to automate and optimize their data, transforming it into operational assets so that valuable insights can be used for real-time business decision making. By some estimates, 80% of the work involved in data analysis is spent on data collection and preparation, meaning that data orchestration can significantly reduce the amount of time spent on processing and scheduling. The goal of this work to overview modern data orchestration tools. The object of research is data engineering. The subject of research is data orchestration. Conclusions. In the age of big data, effective orchestration is an integral part of successful data work. Choosing the right orchestration tool is a critical task for any data engineering team. Each tool has its own strengths and is designed for different needs. The optimal choice depends on the specific requirements, infrastructure, and the team's familiarity with the tool.Downloads
References
Fundamentals of Data Engineering. Authors: Joseph Reis and Matthew Housley -2022. – 447 p.
Data Pipelines Pocket Reference: Moving and Processing Data for Analytics. Authors: James Densmore – 2021. – 274 p.
Коваленко А. А., Кучук Г. А. Методи синтезу інформаційної та технічної структур системи управління об’єктом критичного застосування. Сучасні інформаційні системи. 2018. Т. 2, № 1. С. 22–27. DOI: https://doi.org/10.20998/2522-9052.2018.1.04
Свиридов А. C., Коваленко А. А., Кучук Г. А. Метод перерозподілу пропускної здатності критичної ділянки мережі на основі удосконалення ON/OFF-моделі трафіку. Сучасні інформаційні системи. 2018. Т. 2, № 2. С. 139–144. DOI: https://doi.org/10.20998/2522-9052.2018.2.24
Datsenko, S. and Kuchuk, H. (2023), “Biometric authentication utilizing convolutional neural networks”, Advanced Information Systems, Vol. 7, no. 2, pp. 87–91, doi: https://doi.org/10.20998/2522-9052.2023.2.12
Petrovska, I. and Kuchuk, H. (2023), “Adaptive resource allocation method for data processing and security in cloud environment”, Advanced Information Systems, Vol. 7, No. 3, pp. 67–73, doi: https://doi.org/10.20998/2522-9052.2023.3.10
Офіційний сайт Airflow [Electronic resource] – URL: https://airflow.apache.org/docs/apache-airflow/stable/index.html
Офіційний сайт Prefect [Electronic resource] – URL: https://docs.prefect.io/latest/
Офіційний сайт Mage [Electronic resource] – URL: https://docs.mage.ai/introduction/overview
Офіційний сайт Kestra [Electronic resource] – URL: https://kestra.io/docs
Офіційний сайт Dagster [Electronic resource] – URL: https://docs.dagster.io/getting-started/what-why-dagster