Цель исследования

izvestswsu

Известия Юго-Западного государственного университета

Proceedings of the Southwest State University

2223-15602686-6757

ЮЗГУ

10.21869/2223-1560-2019-23-6-225-240

izvestswsu-670

Research Article

Информатика, вычислительная техника и управление

Computer science, computer engineering and IT managment

Модели и методика определения речевой активности пользователя социо-киберфизической системы

Models and a Tecnique for Determining the Speech Activity of a User of a Socio-Cyberphysical System

Усина

Е. Е.

Usina

E. E.

Усина Елизавета Евгеньевна, младший научный сотрудник лаборатории технологий больших данных социокиберфизических систем

Санкт-Петербург

Elizaveta E. Usina, Junior Researcher, Laboratory of Big Data Technologies of Sociocyberphysical Systems

St. Petersburg

Шабанова

А. Р.

Shabanova

A. R.

Шабанова Александра Романовна, младший научный сотрудник лаборатории технологий больших данных социокиберфизических систем

Санкт-Петербург

Alexandra R. Shabanova, Junior Researcher, Laboratory of Big Data Technologies of Sociocyberphysical Systems

St. Petersburg

Лебедев

И. В.

Lebedev

I. V.

Лебедев Игорь Владимирович, младший научный сотрудник лаборатории технологий больших данных социокиберфизических систем

Санкт-Петербург

Igor V. Lebedev, Junior Researcher, Laboratory of Big Data Technologies of Sociocyberphysical System

St. Petersburg

Санкт-Петербургский институт информатики и автоматизации Российской академии наукSt. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences

Санкт-Петербургский институт информатики и автоматизации Российской академии наук,St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences

2019

23022020

236225240

2020

Усина Е.Е., Шабанова А.Р., Лебедев И.В.

Usina E.E., Shabanova A.R., Lebedev I.V.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://izvestswsu.elpub.ru/jour/article/view/670

Цель исследования

Цель исследования. Статья посвящена разработке модельно-алгоритмического обеспечения процесса определения речевой активности пользователя социо-киберфизической системы. Предложена топологическая модель распределенной подсистемы аудиозаписи, реализуемой в ограниченных физических пространствах (помещениях), позволяющая оценить качество воспринимаемых аудиосигналов для случая распределения микрофонов в таком помещении. На основе данной модели разработана методика определения речевой активности пользователя социо-киберфизической системы, максимизирующая качество воспринимаемых аудиосигналов при перемещении пользователя в помещении за счет определения координат установки микрофонов.

Методы

Методы. Для наиболее полного анализа и формального описания распределенной подсистемы аудиозаписи был использован математический аппарат теории графов и теории множеств. С целью определения координат размещения микрофонов в одном помещении была разработана соответствующая методика, которая предполагает проведение таких операций, как излучение в помещении речевого сигнала с помощью акустического оборудования и замер уровней сигнала в предполагаемых для установки микрофонов местах с использованием шумомера.

Результаты

Результаты. Были рассчитаны зависимости коэффициента корреляции объединенного сигнала и исходного тестового сигнала от расстояния до источника сигнала для различного количества микрофонов. Полученные зависимости позволяют определить минимально необходимое количество разнесенных микрофонов для обеспечения качественной записи речи пользователя. Результаты апробации разработанной методики речевой активности в конкретном помещении свидетельствуют о возможности и высокой эффективности определения речевой активности пользователя социо-киберфизической системы.

Заключение

Заключение. Использование предложенной методики определения речевой активности пользователя социокиберфизической системы позволит повысить качество записи аудиосигнала и, как следствие, его последующей обработки с учетом возможного перемещения пользователя.

Purpose of reseach

Purpose of reseach. The article presents the development of the model-algorithmic support for the process of determining the speech activity of a user of a socio-cyberphysical system. A topological model of a distributed subsystem of audio recordings implemented in limited physical spaces (rooms) is proposed; the model makes it possible to assess the quality of perceived audio signals for the case of distribution of microphones in such a room. Based on this model, a technique for determining the speech activity of a user of a socio-cyberphysical system, which maximizes the quality of perceived audio signals when a user moves in a room by means of determining the installation coordinates of microphones has been developed.

Methods

Methods. The mathematical tools of graph theory and set theory was used for the most complete analysis and formal description of the distributed subsystem of the audiorecording. In order to determine the coordinates of the placement of microphones in one room, a relevant technique was developed; it involves performing such operations as emitting a speech signal in a room using acoustic equipment and measuring signal levels using a noise meter in the places intended for installing microphones.

Results

Results. The dependences of the correlation coefficient of the combined signal and the initial test signal on the distance to the signal source were calculated for a different number of microphones. The obtained dependences allow us to determine the minimum required number of spaced microphones to ensure high-quality recording of the user’s speech. The results of testing the developed technique for determining speech activity in a particular room indicate the possibility and high efficiency of determining the speech activity of a user of a socio-cyberphysical system.

Conclusion

Conclusion. Application of the proposed technique for determining the speech activity of a user of a sociocyberphysical system will improve the recording quality of the audio signal and, as a consequence, its subsequent processing, taking into account the possible movement of a user.

социо-киберфизическая системаречьмикрофоныраспределенная аудиозапись

socio-cyberphysical systemspeechmicrophonesdistributed audio recording

References1

Internet of Things, IoT European Research Cluster. [процитировано 6 ноября 2019]. URL: http://www.internet-of-things-research.eu/about_iot.htm

Internet of Things, IoT European Research Cluster. [Quoted November 6, 2019]. Available at: http://www.internet-of-things-research.eu/about_iot.htm

Teaching Smart Production: An insight into the Learning Factory for Cyber-Physical Production Systems (LVP) / L. Merkela, J. Atuga, L. Merhara, C. Schultza, S. Braunreuthera, G. Reinharta // Procedia Manufacturing. 2017. №9. P. 269-274. https://doi.org/10.1016/j.promfg.2017.04.034

Merkela L., Atuga J., Merhara L., Schultza C., Braunreuthera S., Reinharta G. Teaching Smart Production: An insight into the Learning Factory for Cyber-Physical Production Systems (LVP). Procedia Manufacturing, 2017, no. 9, pp. 269-274. https:// doi.org/ 10.1016/j.promfg.2017.04.034

Knowledge-Based Decision Making in a CyberPhysical Production Scenario / J. KlöberKoch, S. Pielmeier, J. Grimm, M. Brandt, M., Schneider G. Reinhart // 7th Conference on Learning Factories. 2017. №7. P. 167-174. https://doi.org/10.1016/ j.promfg.2017.04.014

Klöber-Koch J., Pielmeier S. Grimm J., Brandt M., Schneider M., Reinhart G. KnowledgeBased Decision Making in a CyberPhysical Production Scenario. 7th Conference on Learning Factories, 2017, no. 7, pp. 167-174. https://doi.org/ 10.1016/ j.promfg.2017.04.014

Jiang P., Ding K., Leng J. Towards a cyber-physical-socialconnected and serviceoriented manufacturing paradigm: Social Manufacturing // Manufacturing Letters. 2016. №7. P. 15-21. https://doi.org/10.1016/j.mfglet.2015.12.002

Jiang P., Ding K., Leng J. Towards a cyber-physical-socialconnected and serviceoriented manufacturing paradigm: Social Manufacturing. Manufacturing Letters. 2016, no. 7, pp. 15-21. https://doi.org/10.1016/j.mfglet.2015.12.002

Cassandras C.G. Smart Cities as Cyber-Physical Social Systems // Engineering. 2016. №2. P. 156-158. https://doi.org/10.1016/J.ENG.2016.02.012

Cassandras C.G. Smart Cities as Cyber-Physical Social Systems. Engineering, 2016, no. 2, pp. 156-158. https://doi.org/10.1016/J.ENG.2016.02.012

Смирнов А.В., Левашова Т.В. Приобретение знаний в социокиберфизических системах в процессе информационного взаимодействия ресурсов // Информационноуправляющие системы. 2017. №6. P. 113–122.

Smirnov A.V., Levashova T.V. Priobretenie znanii v sotsiokiberfizicheskikh sistemakh v protsesse informatsionnogo vzaimodeistviya resursov [The acquisition of knowledge in sociocyberphysical systems in the process of information interaction of resources]. Informatsionno-upravlyayushchie sistemy = Information and Control Systems, 2017, no.6, pp. 113–122 (In Russ.)

Мазуренко И.Л. Многоканальная система распознавания речи // Сборник трудов VI всероссийской конференции «Нейрокомпьютеры и их применение». М., 2000.

Mazurenko I.L. Mnogokanal'naya sistema raspoznavaniya rechi [Multi-channel speech recognition system]. Sbornik trudov VI vserossiiskoi konferentsii "Neirokomp'yutery i ikh primenenie" [Proceedings of the VI All-Russian conference "Neurocomputers and their application"]. Moscow, 2000. (In Russ.).

Beyond Close-talk – Issues in Distant Speech Acquisition, Conditioning Classification, and Recognition / V. Stanford, C. Rochet, M. Michel, J.N Garofolo / Proc. ICASSP 2004 Meeting Recognition Workshop. 2004. P. 123-127.

Stanford V., Rochet C., Michel M., Garofolo J. Beyond Close-talk – Issues in Distant Speech Acquisition, Conditioning Classification, and Recognition. Proc. ICASSP 2004 Meeting Recognition Workshop, 2004, pp. 123-127.

Pfau T., Ellis D. P. W., Stolcke A. Multispeaker speech activity detection for the ICSI meeting recorder // IEEE Workshop on Automatic Speech Recognition and Understanding. 2001. P. 107-110. https://doi.org/10.1109/ASRU.2001.1034599

Pfau T., Ellis D. P. W., Stolcke A. Multispeaker speech activity detection for the ICSI meeting recorder. IEEE Workshop on Automatic Speech Recognition and Understanding, 2001, pp. 107-110. https://doi.org/10.1109/ASRU.2001.1034599

Центр речевых технологий. 2019 [процитировано 6 ноября 2019]. URL: http://www.speechpro.ru

Centr rechevykh tekhnologiy [Speech technology center]. 2019 [Quoted November 6, 2019]. Available at: http://www.speechpro.ru (In Russ.).

АО «ОКБ «Октава». 2019 [процитировано 6 ноября 2019]. URL: https:// www.окбоктава.рф

AO "OKB "Oktava". 2019 [Quoted November 6, 2019]. Available at: https://www.окбоктава.рф (In Russ.).

Акустика / Ш.Я. Вахитов, Ю.А. Ковалгин, А.А. Фадеев, Ю.П. Щевьев. М.: Горячая линия, 2009.

Vakhitov Sh.Ya., Kovalgin Yu.A., Fadeev A.A., Shcheviev Yu.P. Akustika [Acoustics]. Moscow, Goryachaya liniya Publ., 2009. (In Russ.).

Разработка многомодального информационного киоска / А.Л. Ронжин, А.А. Карпов, А.Б., Леонтьева Б.Е. Костюченко // Труды СПИИРАН. 2007. №5(1). C. 227-245.

Ronzhin A.L., Karpov A.A., Leontyeva A.B., Kostyuchenko B.E. Razrabotka mnogomodal'nogo informatsionnogo kioska [The development of the multimodal information kiosk]. Trudy SPIIRAN = SPIIRAS Proceedings, 2007, no. 5(1), pp. 227-245 (In Russ.)

Ронжин А.Л., Карпов А.А., Кагиров И.А. Особенности дистанционной записи и обработки речи в автоматах самообслуживания // Информационно-управляющие системы. 2009. №42(5). C. 32-38.

Ronzhin A.L., Karpov A.A., Kagirov I.A. Osobennosti distantsionnoi zapisi i obrabotki rechi v avtomatakh samoobsluzhivaniya [Features of remote recording and speech processing in self-service machines]. Informatsionno-upravlyayushchie sistemy = Information and control systems, 2009, no. 42(5), pp. 32–38 (In Russ.)

Харкевич А.А. Борьба с помехами. Изд. 4-е. М.: Книжный дом «ЛИБРОКОМ», 2013.

Kharkevich A.A. Bor'ba s pomekhami [Struggle against interference]. Moscow, Knizhnyi dom "LIBROKOM" Publ., 2013 (In Russ.)

Скляр Б. Цифровая связь. Теоретические основы и практическое применение. Изд. 2-е, испр. М.: Издательский дом «Вильямс», 2003.

Sklar B. Tsifrovaya svyaz'. Teoreticheskie osnovy i prakticheskoe primenenie [Digital communication. Theoretical foundations and practical application]. Moscow, Izdatel'skii dom "Vil'yams" Publ., 2003 (In Russ.)

Ogunfunmi T., Togneri R., Narasimha M. Speech and audio processing for coding, enhancement and recognition. New York : Springer, 2015.

Ogunfunmi T., Togneri R., Narasimha M. Speech and audio processing for coding, enhancement and recognition. New York, Springer Publ., 2015.

Марковников Н.М., Кипяткова И.С. Аналитический обзор интегральных систем распознавания речи // Труды СПИИРАН. 2018. №3. C. 77-110. https:// doi.org/10.15622/sp.58.4

Markovnikov N.M., Kipyatkova I.S. Analiticheskii obzor integral'nykh sistem raspoznavaniya rechi [An Analytic Survey of End-to-End Speech Recognition Systems]. Trudy SPIIRAN = SPIIRAS Proceedings, 2018, no. 3, pp. 77-110 (In Russ.). https://doi.org/10.15622/sp.58.4.

The authors declare that there are no conflicts of interest present.