Цель исследования

izvestswsu

Известия Юго-Западного государственного университета

Proceedings of the Southwest State University

2223-15602686-6757

ЮЗГУ

10.21869/2223-1560-2022-26-2-159-171

izvestswsu-1031

Research Article

Информатика, вычислительная техника и управление

Computer science, computer engineering and IT managment

Увеличение производительности языковых моделей «трансформер» в информационных вопросно-ответных системах

Increased Performance of Transformers Language Models in Information Question and Response Systems

https://orcid.org/0000-0002-7677-1800

Галеев

Д. Т.

Galeev

D. T.

Галеев Денис Талгатович, аспирант

ул. 50 лет Октября, д. 94, г. Курск 305040

Denis T. Galeev, Post-Graduate Student

50 Let Oktyabrya str. 94, Kursk 305040

ra3wvw@mail.ru

https://orcid.org/0000-0003-1772-7663

Панищев

В. С.

Panishchev

V. S.

Панищев Владимир Славиевич, кандидат технических наук

ул. 50 лет Октября, д. 94, г. Курск 305040

Vladimir S. Panishchev, Cand. of Sci. (Engineering)

50 Let Oktyabrya str. 94, Kursk 305040

gskunk@yandex.ru

Титов

Д. В.

Titov

D. V.

Титов Дмитрий Витальевич, доктор технических наук, доцент

ул. 50 лет Октября, д. 94, г. Курск 305040

Dmitry V. Titov, Dr. of Sci. (Engineering), Associate Professor

50 Let Oktyabrya str. 94, Kursk 305040

titov.swsu@gmail.com

Юго-Западный государственный университетSouthwest State University

2022

13022023

262159171

2023

Галеев Д.Т., Панищев В.С., Титов Д.В.

Galeev D.T., Panishchev V.S., Titov D.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://izvestswsu.elpub.ru/jour/article/view/1031

Цель исследования

Цель исследования. Целью работы является увеличение производительности вопросно-ответных информационных систем на русском языке. Научная новизна работы состоит в увеличении производительности для модели RuBERT, которая была обучена для нахождения ответа на вопрос в тексте. Поскольку более производительная языковая модель позволяет обрабатывать большее количество запросов за то же самое время, результаты работы могут найти применение в различных информационных вопросно-ответных системах, для которых важна скорость отклика.

Методы

Методы. В настоящей работе используются методы обработки естественного языка, машинного обучения, уменьшения размера искусственных нейронных сетей. Языковая модель была настроена и обучена при помощи библиотек машинного обучения Torch и Onnxruntime. Оригинальная модель и набор данных для обучения были взяты в библиотеке Huggingface.

Результаты

Результаты. В результате исследования была увеличена производительность работы языковой модели RuBERT при помощи методов уменьшения размера нейронных сетей, таких как дистилляция знаний и квантизация, а также при помощи экспорта модели в формат ONNX и её запуска в среде выполнения ONNX.

Заключение

Заключение. В результате, модель, к которой одновременно были применены дистилляция знаний, квантизация и ONNX оптимизация, получила увеличение производительности в ~4.6 раза (с 66.57 до 404.46 запросов в минуту), при этом размер модели уменьшился в ~13 раз (с 676.29 Мб до 51.66 Мб). Обратной стороной полученной производительности стало ухудшение показателей EM (с 61.3 до 56.87) и F-мера (с 81.66 до 76.97).

Purpose of research

Purpose of research. The purpose of this work is to increase the performance of question and response information systems in Russian. Scientific novelty of the work is to increase the performance for RuBERT model, which was trained to find the answer to the question in the text. As far as a more efficient language model allows more requests to be processed in the same time, the results of this work can be used in various information question and response systems for which response speed is important.

Methods

Methods. The present work uses methods of processing natural language, machine learning, reducing the size of artificial neural networks. The language model was configured and trained using Torch and Onnxruntime machine learning libraries. The original model and training dataset were taken from the Huggingface Library.

Results

Results. As a result of the study, the performance of RuBERT language model was increased using methods to reduce the size of neural networks, such as distillation of knowledge and quantization, as well as by exporting the model to ONNX format and running it in ONNX runtime.

Conclusion

Conclusion. As a result, the model, to which knowledge distillation, quantization and ONNX optimization were simultaneously applied, received a performance increase of ~ 4.6 times (from 66.57 to 404.46 requests per minute), while the size of the model decreased ~ 13 times (from 676.29 MB to 51.66 MB). The downside of obtained performance was EM deterioration (from 61.3 to 56.87) and F-measure (from 81.66 to 76.97).

машинное обучениеглубокое обучениенейронные сетиобработка естественного языкатрансформер

machine learningdeep learningneural networksnatural language processingtransformer

References1

Рябинов А.В., Уздяев М.Ю., Ватаманюк И.В. Применение многозадачного глубокого обучения в задаче распознавания эмоций в речи // Известия Юго-Западного государственного университета. 2021; 25(1): 82-109. https://doi.org/10.21869/2223-1560-2021-25-1-82-109

Ryabinov A.V., Uzdiaev M.Yu., Vatamaniuk I.V. [Applying Multitask Deep Learning to Emotion Recognition in Speech]. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University 2021;25(1):82-109. (In Russ.) https://doi.org/10.21869/2223-1560-2021-25-1-82-109.

Vaswani A. et al. Attention is all you need // Advances in Neural Information Processing Systems 2017-December, 5999–6009 (Neural information processing systems foundation, 2017).

Vaswani A. et al. Attention is all you need. Advances in Neural Information Processing Systems 2017-December, 5999–6009 (Neural information processing systems foundation, 2017).

Lewis M. et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. in 7871–7880 (Association for Computational Linguistics (ACL), 2020).

Raffel C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer // Journal of Machine Learning Research 21, (2020).

Raffel C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, (2020).

Zhang J., Zhao Y., Saleh M., Liu P. J. PEGASUS: Pre-Training with extracted gapsentences for abstractive summarization // 37th International Conference on Machine Learning, ICML 2020 PartF168147-15, 11265–11276 (International Machine Learning Society (IMLS), 2020).

Zhang J., Zhao Y., Saleh M., Liu P. J. PEGASUS: Pre-Training with extracted gapsentences for abstractive summarization. 37th International Conference on Machine Learning, ICML 2020 PartF168147-15, 11265–11276 (International Machine Learning Society (IMLS), 2020).

Qi W. et al. ProphetNet: Predicting future n-gram for sequence-to-sequence pretraining // Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 2401–2410 (Association for Computational Linguistics (ACL), 2020).

Qi W. et al. ProphetNet: Predicting future n-gram for sequence-to-sequence pretraining. Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020; 2401–2410 (Association for Computational Linguistics (ACL), 2020).

Devlin J., Chang M. W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding // NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186 (Association for Computational Linguistics (ACL), 2019).

Devlin J., Chang M. W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1, 4171–4186 (Association for Computational Linguistics (ACL), 2019).

Lan Z. et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. in ICLR (OpenReview.net, 2020).

Liu Y. et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692, (2019).

Clark K., Luong M.-T., Le Q. V., Manning, C. D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. CoRR abs/2003.10555, (2020).

Clark K., Luong M.-T., Le Q. V., Manning C. D. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. CoRR abs/2003.10555, (2020).

Dai Z. et al. Transformer-XL: Attentive language models beyond a fixed-length context // ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference 2978–2988 (Association for Computational Linguistics (ACL), 2020).

Dai Z. et al. Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 2978–2988 (Association for Computational Linguistics (ACL), 2020).

keskar n. s., mccann b., varshney l. r., xiong c., socher r. ctrl: a conditional transformer language model for controllable generation. corr abs/1909.05858, (2019).

Keskar N. S., McCann B., Varshney L. R., Xiong C., Socher R. CTRL: A Conditional Transformer Language Model for Controllable Generation. CoRR abs/1909.05858, (2019).

Radford A., Narasimhan K., Salimans T., Sutskever I. (OpenAI Transformer): Improving Language Understanding by Generative Pre-Training. OpenAI 1–10 (2018).

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, I. S. Language Models are Unsupervised Multitask Learners. OpenAI Blog 1, 1–7 (2020).

Brown T. B. et al. Language models are few-shot learners // Advances in Neural Information Processing Systems 2020-December, (Neural information processing systems foundation, 2020).

Brown T. B. et al. Language models are few-shot learners. Advances in Neural Information Processing Systems 2020-December, (Neural information processing systems foundation, 2020).

Hahn S., Choi H. Self-knowledge distillation in natural language processing // International Conference Recent Advances in Natural Language Processing, RANLP 2019-September, 423–430 (Incoma Ltd, 2019).

Hahn S., Choi H. Self-knowledge distillation in natural language processing. International Conference Recent Advances in Natural Language Processing, RANLP 2019-September, 423–430 (Incoma Ltd, 2019).

Sanh V., Debut L., Chaumond J., Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108, (2019).

Li T., El Mesbahi Y., Kobyzev I., Rashid A., Mahmud A., Anchuri N., Hajimolahoseini H., Liu Y., Rezagholizadeh M. A Short Study on Compressing Decoder-Based Language Models. CoRR abs/2110.08460 (2021).

Le T. D. et al. Compiling ONNX Neural Network Models Using MLIR. CoRR abs/2008.08272, (2020).

Efimov P., Chertok A., Boytsov L., Braslavski, P. SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12260 LNCS, 3–15 (Springer Science and Business Media Deutschland GmbH, 2020).

Efimov P., Chertok A., Boytsov L., Braslavski P. SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12260 LNCS, 3–15 (Springer Science and Business Media Deutschland GmbH, 2020).

Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language // Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 2019-May, 333–339 (ABBYY PRODUCTION LLC, 2019).

Kuratov Y., Arkhipov M. Adaptation of deep bidirectional multilingual transformers for Russian language. Komp’juternaja Lingvistika i Intellektual’nye Tehnologii 2019-May, 333–339 (ABBYY PRODUCTION LLC, 2019).

Abdaoui A., Pradel C., Sigel G. Load What You Need: Smaller Versions of Mutililingual BERT. in 119–123 (Association for Computational Linguistics (ACL), 2020).

The authors declare that there are no conflicts of interest present.