EVOLUTION AND DISTRIBUTION ANALYSIS OF MULTIMODAL ARTIFICIAL INTELLIGENCE SYSTEMS

Authors

  • A. Kapiton
  • D. Tyshсhenko
  • A. Desiatko
  • V. Lazorenko

DOI:

https://doi.org/10.26906/SUNZ.2024.4.075

Keywords:

artificial intelligence, bioengineering, generative models, multimodality

Abstract

The article considers the main aspects of evolution and performs a thorough analysis of the stages of formation of multimodal artificial intelligence systems (AIS). It was determined that in modern realities, artificial intelligence has undergone a transformational shift towards embracing multimodality in large language models. Modern approaches and ways of improving large language models by means of processing and generating a large amount of data are analyzed. The stages of transformation of artificial intelligence in the direction of multimodality of innovative development in large language models have been studied. The issue of verification and interaction of information systems with the surrounding world is considered. It was determined that they are inherently multimodal, multicomponent. Ways of improving large language models with the help of the ability to process and generate different data modalities are analyzed. It has been investigated that modern multimodal artificial intelligence systems are effectively used in various fields of science, education, and economics and require further development and improvement. It was determined that due to the rapid development of information technologies and systems in various spectrums of life, AI is experiencing a rapid modification, where generative models, which are becoming more and more perfect, deserve special attention. An overview of the architecture of the AnyGPT model is performed, where modalities are tokenized into discrete tokens, on the basis of which LLM performs multimodal perception and generation in autoregression. The methodology underlying AnyGPT was found to be multi-component, with the model demonstrating capabilities on par with specialized models in all assessment modalities tested. It has been established that tools designed to detect objects generated by artificial intelligence are in a state of development and are constantly being modified.

Downloads

Download data is not yet available.

References

Chengyi Wang, Sanyuan Chen, Yu Wu, Zi-Hua Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, and Furu Wei. Neural codec language models are zero-shot text to speech synthesizers. ArXiv preprint, abs/2301.02111, 2023. URL: https://arxiv.org/abs/2301.02111.

Zineng Tang, Ziyi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, and Mohit Bansal. Codi-2: In-context, interleaved, and interactive any-to-any generation. ArXiv preprint, abs/2311.18775, 2023a. URL: https://arxiv.org/abs/2311.18775.

Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi. Self-instruct: Aligning language model with self generated instructions. ArXiv preprint, abs/2212.10560, 2022. URL: https://arxiv.org/abs/2212.10560.

Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, and Tat-Seng Chua. Next-gpt: Any-to-any multimodal llm. ArXiv preprint, abs/2309.05519, 2023. URL; https://arxiv.org/abs/2309.05519.

Yusong Wu, K. Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2022. URL:https://api.semanticscholar.org/CorpusID: 253510826.

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end-to-end neural audio codec. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021. URL:https://api.semanticscholar.org/CorpusID: 236149944.

Tyshсhenko D., Franchuk Т., Zakharov R., Moskalenko V. Підтримка динамічних потреб безпеки засобами VPN Системи управління, навігації та зв’язку. Полтава: ПНТУ, 2024. Т. 3 (77). 149-152.

Курилех А., Капітон А. Використання штучного інтелекту для розвитку CRM-систем. Стан, досягнення та перспективи інформаційних систем і технологій. Одеса: ОНТУ, 2024. 357-359.

Капітон А., Сухоребрий О., Ненич Д. Використання мультимодального штучного інтелекту в економіці, освіті, науці та транспорті. Інформаційні технології та цифрова економікa. Київ: ДУІТ, 2024. 83-85.

Kапітон А, Гладкий С., Пророк М. Практичні застосування інтеграції штучного інтелекту в процес освіти. Стан, досягнення та перспективи інформаційних систем і технологій Одеса: ОНТУ, 2024. 348-349.

PwC’s 2023 Emerging Technology Survey. URL: https://www.pwc.com/us/en/tech-effect/ai-analytics/ai-predictions.html

Gemini. URL: https://blog.google/technology/ai/google-gemini-ai/#sundar-note

Bing. URL: https://www.microsoft.com/en-us/edge/features/the-newbing?form=MA13FJ

Introducing LLaMA. URL: https://ai.meta.com/blog/large-language-model-llamameta-ai/

Chat With RTX. URL: https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtxgenerative-ai/

Verner S. IBM adds AI-enhanced data resilience capabilities to help combat ransomware and other threats with enhanced storage solutions, 2024. URL: newsroom.ibm.com/

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling URL: https://arxiv.org/pdf/2402.12226

Laion-aesthetics. URL: https://laion.ai/blog/laion-aesthetics/, 2022a.

Laion coco: 600m synthetic captions from laion2b-en. URL: https://laion.ai/blog/laion-coco/, 2022b.

AI identification tools URL: https://thetransmitted.com/ai/instrumenty-identyfikacziyi-shi-zhovten-2024/

Downloads

Published

2024-11-28

Most read articles by the same author(s)

1 2 > >>