<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">izvestswsu</journal-id><journal-title-group><journal-title xml:lang="ru">Известия Юго-Западного государственного университета</journal-title><trans-title-group xml:lang="en"><trans-title>Proceedings of the Southwest State University</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">2223-1560</issn><issn pub-type="epub">2686-6757</issn><publisher><publisher-name>ЮЗГУ</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.21869/2223-1560-2019-23-3-86-99</article-id><article-id custom-type="elpub" pub-id-type="custom">izvestswsu-531</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>Информатика, вычислительная техника и управление</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>Computer science, computer engineering and IT managment</subject></subj-group></article-categories><title-group><article-title>Алгоритмы автоматизированного обучения диалоговых систем</article-title><trans-title-group xml:lang="en"><trans-title>Automated Training Algorithms of Dialog Systems</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Спирин</surname><given-names>Д. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Spirin</surname><given-names>D. V.</given-names></name></name-alternatives><bio xml:lang="ru"/><bio xml:lang="en"/><email xlink:type="simple">spirin.dmitrij@list.ru</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Брежнев</surname><given-names>О. С.</given-names></name><name name-style="western" xml:lang="en"><surname>Brezhnev</surname><given-names>O. S.</given-names></name></name-alternatives><bio xml:lang="ru"/><bio xml:lang="en"/><email xlink:type="simple">oleg-423@yandex.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Пензенский государственный университет</institution></aff><aff xml:lang="en"><institution>Penza State University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2019</year></pub-date><pub-date pub-type="epub"><day>06</day><month>09</month><year>2019</year></pub-date><volume>23</volume><issue>3</issue><fpage>86</fpage><lpage>99</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Спирин Д.В., Брежнев О.С., 2019</copyright-statement><copyright-year>2019</copyright-year><copyright-holder xml:lang="ru">Спирин Д.В., Брежнев О.С.</copyright-holder><copyright-holder xml:lang="en">Spirin D.V., Brezhnev O.S.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://izvestswsu.elpub.ru/jour/article/view/531">https://izvestswsu.elpub.ru/jour/article/view/531</self-uri><abstract><sec><title>Цель исследования</title><p>Цель исследования. Представленное в данной статье исследование проведено в рамках проекта Salebot.pro (на ресурсе https://salebot.pro) и было нацелено на разработку простой и эффективной реализации диалоговой системы.</p></sec><sec><title>Методы</title><p>Методы. План исследования предусматривал анализ различных методов обработки естественных язы-ков и машинного обучения. Реализации методов были взяты из популярных библиотек с открытым исход-ным кодом. Построена модель диалоговой системы в двух вариантах: на основе фреймворка Spacy и метрического алгоритма оценки, на основе расстояния Левенштейна. Сравнивались простота реализа-ции и затраты на обучение системы и персонала.</p></sec><sec><title>Результаты</title><p>Результаты. Описанные в статье алгоритмы сопоставляют наиболее похожие слова из двух текстов и подсчитывают средний процент совпадений. Такой подход обеспечивает возможность приемлемой работы на языках со свободным порядком слов, к которым относится и русский язык. Выполненное исследование позволило разработать алгоритм автоматизированного обучения диалоговых систем в режиме реального времени без потери контекста. На той же основе разработан алгоритм обучения диалоговой системы по истории диалога. Предлагается использовать данные алгоритмы совместно. При создании диалоговой системы первоначально необходимо ее обучить на истории диалогов, а затем перманентно обучать в режиме реального времени.</p></sec><sec><title>Заключение</title><p>Заключение. Достоинством разработанного алгоритма является легкость в реализации и дешевизна построения инфраструктуры, необходимой для обучения модели, и ее обслуживания, а также простота в эксплуатации. Применяется подход, который отличается от обучения с учителем, что позволяет ускорить процесс обучения и ввода в систему новых данных. Особенностью разработанных алгоритмов является игнорирование семантики текста, что делает обучение автоматизированным, а не автома-тическим.</p></sec></abstract><trans-abstract xml:lang="en"><sec><title>Purpose of research</title><p>Purpose of research. The research described in this article is conducted within the Salebot.pro project (on the https://salebot.pro resource) and aimed at development of simple and effective realization of a dialog system.</p></sec><sec><title>Methods</title><p>Methods. The research plan provided the analysis of various methods of natural processing languages and machine learning languages. Implementation of these methods was taken from popular libraries with an open source code. The model of a dialog system was made in two options: on the basis of Spacy freymvork and metric assessment algorithm, on the basis of Levenstein's distance. Simplicity of implementation and costs on training of a system and personnel were compared.</p></sec><sec><title>Results</title><p>Results. The algorithms described in article compare the most similar words from two texts and count average percent of coincidence. Such approach provides a possibility of acceptable work in languages with free word order. Russian is one such languages. The executed research allowed developing an automated training algorithm of dialog systems in real time without context loss. On the same basis training algorithm of a dialog system in dialog history is developed. It is offered to use these algorithms together. It is originally necessary to train it at history of dialogues during creation of a dialogue system. And then it is necessary to train it permanently in real time.</p></sec><sec><title>Conclusion</title><p>Conclusion. The advantage of the developed algorithm is ease in implementation and low cost of infrastructure which is necessary for model training and its service and also operation simplicity. Approach which differs from training with the teacher allows accelerating training process and input of new data into the system. Specific feature of the developed algorithms is ignoring of text semantics that makes training automated but not automatic.</p></sec></trans-abstract><kwd-group xml:lang="ru"><kwd>диалоговая система</kwd><kwd>конечный автомат</kwd><kwd>фрейм</kwd><kwd>автоматизированное обучение</kwd><kwd>алгоритм</kwd></kwd-group><kwd-group xml:lang="en"><kwd>dialog system</kwd><kwd>finite-state machine</kwd><kwd>frame</kwd><kwd>automated training</kwd><kwd>algorithm</kwd></kwd-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Провотар А.И., Клочко К. А. Особенности и проблемы виртуального общения с помощью чат-ботов // Информационные технологии и компьютерная техника. Научные работы ВНТУ. 2013. № 3. С. 1-6.</mixed-citation><mixed-citation xml:lang="en">Provotar A. I., Klochko K. A. Osobennosti i problemy virtual'nogo obshcheniya s pomoshch'yu chat-botov  [Features and problems of virtual communication using chat bots].   Informatsionnye tekhnologii i komp'yuternaya tekhnika. Nauchnye raboty VNTU. = Information technologies and computer equipment Scientific works VNTU, 2013, no. 3, pp. 1-6 (In Russ.).</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Training spaCy’s Statistical Models. URL: https://spacy.io/usage/training (дата обращения: 07.05.2019).</mixed-citation><mixed-citation xml:lang="en">[Training spaCy’s Statistical Models]. Available at: https://spacy.io/usage/training (accessed 07.05.2019).</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Apache OpenNLP DeveloperDocumentation. URL: https://opennlp.apache.org/ docs/1.9.0/manual/ opennlp.html (дата обращения: 07.05.2019).</mixed-citation><mixed-citation xml:lang="en">Apache OpenNLP Developer Documentation. Available at: https:// opennlp.apache.org/ docs/1.9.0/manual/ opennlp.html (accessed 07.05.2019).</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Задача о редакционном расстоянии, алгоритм Вагнера-Фишера. URL: https:// neerc.ifmo.ru/wiki/index.php?title= Задача_о_редакционном_расстоянии,_ алгоритм_ Вагнера-Фишера (дата обращения: 07.05.2019).</mixed-citation><mixed-citation xml:lang="en">Zadacha o redaktsionnom rasstoyanii, algoritm Vagnera-Fishera [The task of the editorial distance, the algorithm of Wagner-Fisher]. Available at: The access method is free: https://neerc.ifmo.ru/wiki/index.php?title=Task_about_education_distance ,_algorithm_Wagner-Fisher (accessed 07.05.2019) (In Russ.).</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Ramsay A. Discourse. In Mitkov, R. (Ed.). The Oxford Handbook of Computational Linguistics. Oxford University Press, USA, 2003. 717 p.</mixed-citation><mixed-citation xml:lang="en">Ramsay A. Discourse. In Mitkov, R. (Ed.). The Oxford Handbook of Computational Linguistics. Oxford University Press, USA, 2003, 717 p.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Traum D., Larsson S. The information state approach to dialogue management // In J. van Kuppevelt &amp; R. Smith (Eds.), Current and new directions in discourse and dialogue. Springer, 2003. P. 325–354.</mixed-citation><mixed-citation xml:lang="en">Traum D., Larsson S. The information state approach to dialogue management. In J. van Kuppevelt &amp; R. Smith (Eds.), Current and new directions in discourse and dialogue Springer, 2003, p. 325–354.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Computing Power Throughout History. URL: https://www.alternatewars.com/ BBOW/ Computing/Computing_Power.htm (дата обращения: 07.05.2019).</mixed-citation><mixed-citation xml:lang="en">Computing Power Throughout History Available at:  https:// www.alternatewars.com/ BBOW/ Computing / Computing_Power.htm (accessed 07.05.2019).</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Автоматизированное обучение. URL: https://salebot.pro/articles/9 (дата обращения: 07.05.2019).</mixed-citation><mixed-citation xml:lang="en">Avtomatizirovannoe obuchenie. Available at: The access method is free: https://salebot.pro/articles/9 (accessed 07.05.2019) (In Russ.).</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Спирин Д.В., Брежнев О.С., Баринов А.Д. Алгоритм автоматизированного обучения // Сборник статей II Международной научно-практической конференции. Пенза: МЦНС «Наука и Просвещение», 2018. С. 49-53.</mixed-citation><mixed-citation xml:lang="en">Spirin D.V., Brezhnev O. S., Barinov A. D. [Algorithm of automated learning]. Sbornik statei II Mezhdunarodnoi nauchno-prakticheskoi konferentsii   [Collection of articles of the II International Scientific and Practical Conference]. Penza, 2018, pp. 49-53 (In Russ.).</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">A multi-task approach for named entity recognition in social media data / G. Aguilar, S. Maharjan, A. Pastor Lopez-Monroy, T. Solorio // In Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017. P. 148–153.</mixed-citation><mixed-citation xml:lang="en">Aguilar G., Maharjan S., Pastor Lopez-Monroy A., Solorio T..A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 148–153.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Daniken P., Cieliebak M. Transfer learning and sentence level features for named entity recognition on tweets // In Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017. P. 166–171.</mixed-citation><mixed-citation xml:lang="en">Daniken P., Cieliebak M. Transfer learning and sentence level features for named entity recognition on tweets. In Proceedings of the 3rd Workshop on Noisy User-generated Text, 2017, pp. 166–171.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Neural Architectures for Named Entity Recognition / G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer // In Proceedings of NAACL-HLT 2016, San Diego, California, June 12-17, 2016. P. 260–270.</mixed-citation><mixed-citation xml:lang="en">Lample G., Ballesteros M., S Subramanian., Kawakami K., Dyer C. Neural Architectures for Named Entity Recognition. In Proceedings of NAACL-HLT 2016, San Diego, California, June 12-17, 2016, pp. 260–270.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Strakova J. Neural Network Based Named Entity Recognition. – Institute of Formal and Applied Linguistics, Prague. 2017. 120 p.</mixed-citation><mixed-citation xml:lang="en">Strakova J. Neural Network Based Named Entity Recognition. Institute of Formal and Applied Linguistics, Prague, 2017, 120 p.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Akkaya E.K. Deep neural networks for named entity recognition on social media. Computer Engineering Dept., Hacettepe University. Beytepe-Ankara, Turkey, 2018. 126 p.</mixed-citation><mixed-citation xml:lang="en">Akkaya E.K. Deep neural networks for named entity recognition on social media. Computer Engineering Dept., Hacettepe University, Beytepe-Ankara, Turkey, 2018. 126 p.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
