ON POSSIBILITIES OF MULTILINGUAL BERT MODEL FOR DETERMINING SEMANTIC SIMILARITIES OF THE NEWS CONTENT

Authors

  • S. Olizarenko
  • V. Argunov

DOI:

https://doi.org/10.26906/SUNZ.2020.3.094

Keywords:

Natural Language Processing, BERT, semantic similarities, news content, Deep Learning

Abstract

The results of implementation of modern achievements in the field of Natural Language Processing field based on the methods and models of Deep Learning technologies into the HIPSTO’s system management of content (HIPSTO Publishing, AI Technology, Digital Media, Mobile Apps) are discussed and analyzed. In particular, the possibilities and ways of applying the multilingual BERT model to handle the problem of semantic likeness of news content have been investigated. An efficient method is proposed to define the semantic similarities of the multilingual news content in HIPSTO aggregated news feeds on the basis of the sentence embeddings using the first task of the pretrained multilingual BERT model within the HIPSTO system of content management. The results of the research highlight the effectiveness and promise of this technology within the HIPSTO project. Below the data of its first implementation in HIPSTO are substantiated scientifically and experimentally.

Downloads

Download data is not yet available.

References

Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil Multilingual Universal Sentence Encoder for Sematic Retrieval. arXiv:1907.04307v1 [cs.CL] 9 Jul 2019.

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder for English. In Proceedings of the 2018 Conf. on Empirical Methods in Natural Language Proc.: System Demonstrations, pages 169–174.

Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS, pages 6000–6010.

Ceshine Lee Multilingual Similarity Search Using Pretrained Bidirectional LSTM Encoder. Evaluating LASER (Language-Agnostic SEntence Representations)/ https://medium.com/the-artificial-impostor/multilingual-similarity-search-using-pretrained-bidirectional-lstm-encoder-e34fac5958b0.

Zero-shot transfer across 93 languages: Open-sourcing enhanced LASER library. POSTED ON JAN 22, 2019 TO AI RESEARCH/ https://engineering.fb.com/ai-research/laser-multilingual-sentence-embeddings/.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805v2 [cs.CL] 24 May 2019.

Join GitHub today, available at: https://github.com/google-research/bert.

Ceshine Lee News Topic Similarity Measure using Pretrained BERT Model. Utilizing Next Sentence Predictions, available at: https://medium.com/the-artificial-impostor/news-topic-similarity-measure-using-pretrained-bert-model-1dbfe6a66f1d.

Jay Alammar The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning), available at: http://jalammar.github.io/illustrated-bert.

available at: https://github.com/ CyberZHG/keras-bert.

available at: https://bert-as-service.readthedocs.io.

Using NLP to Automate Customer Support, Part Two, available at: https://blog.floydhub.com/automate-customer-support-part-two.

available at: https://github.com/facebookresearch/SentEval.

Sam Sucik Compressing BERT for faster prediction, available at: https://blog.rasa.com/compressing-bert-for-faster-prediction-2.

Downloads

Published

2020-09-11