MODERN DATA ORCHESTRATION TOOLS FOR BUILDING BIG DATA PROCESSING PIPELINES

Authors

  • V. Zaleskyi
  • P. Ivanovskii
  • V. Fedorchenko

DOI:

https://doi.org/10.26906/SUNZ.2024.2.095

Keywords:

data orchestration, data processing pipeline, ETL, DAG

Abstract

The data universe in modern companies is constantly expanding. As the amount of data grows, so does the need for management, schedule synchronization, and processing challenges. Companies need to break down the barriers between data sources and storage in order to truly utilize all of the information they collect. Data orchestration allows organizations to automate and optimize their data, transforming it into operational assets so that valuable insights can be used for real-time business decision making. By some estimates, 80% of the work involved in data analysis is spent on data collection and preparation, meaning that data orchestration can significantly reduce the amount of time spent on processing and scheduling. The goal of this work to overview modern data orchestration tools. The object of research is data engineering. The subject of research is data orchestration. Conclusions. In the age of big data, effective orchestration is an integral part of successful data work. Choosing the right orchestration tool is a critical task for any data engineering team. Each tool has its own strengths and is designed for different needs. The optimal choice depends on the specific requirements, infrastructure, and the team's familiarity with the tool.

Downloads

Download data is not yet available.

References

Fundamentals of Data Engineering. Authors: Joseph Reis and Matthew Housley -2022. – 447 p.

Data Pipelines Pocket Reference: Moving and Processing Data for Analytics. Authors: James Densmore – 2021. – 274 p.

Коваленко А. А., Кучук Г. А. Методи синтезу інформаційної та технічної структур системи управління об’єктом критичного застосування. Сучасні інформаційні системи. 2018. Т. 2, № 1. С. 22–27. DOI: https://doi.org/10.20998/2522-9052.2018.1.04

Свиридов А. C., Коваленко А. А., Кучук Г. А. Метод перерозподілу пропускної здатності критичної ділянки мережі на основі удосконалення ON/OFF-моделі трафіку. Сучасні інформаційні системи. 2018. Т. 2, № 2. С. 139–144. DOI: https://doi.org/10.20998/2522-9052.2018.2.24

Datsenko, S. and Kuchuk, H. (2023), “Biometric authentication utilizing convolutional neural networks”, Advanced Information Systems, Vol. 7, no. 2, pp. 87–91, doi: https://doi.org/10.20998/2522-9052.2023.2.12

Petrovska, I. and Kuchuk, H. (2023), “Adaptive resource allocation method for data processing and security in cloud environment”, Advanced Information Systems, Vol. 7, No. 3, pp. 67–73, doi: https://doi.org/10.20998/2522-9052.2023.3.10

Офіційний сайт Airflow [Electronic resource] – URL: https://airflow.apache.org/docs/apache-airflow/stable/index.html

Офіційний сайт Prefect [Electronic resource] – URL: https://docs.prefect.io/latest/

Офіційний сайт Mage [Electronic resource] – URL: https://docs.mage.ai/introduction/overview

Офіційний сайт Kestra [Electronic resource] – URL: https://kestra.io/docs

Офіційний сайт Dagster [Electronic resource] – URL: https://docs.dagster.io/getting-started/what-why-dagster

Published

2024-04-30