Цель исследования

izvestswsu

Известия Юго-Западного государственного университета

Proceedings of the Southwest State University

2223-15602686-6757

ЮЗГУ

10.21869/2223-1560-2025-29-3-86-98

izvestswsu-1499

Research Article

ИНФОРМАТИКА, ВЫЧИСЛИТЕЛЬНАЯ ТЕХНИКА И УПРАВЛЕНИЕ

COMPUTER SCIENCE, COMPUTER ENGINEERING AND CONTROL

Применение глубокого обучения сверточной нейронной сети для классификации жестов из набора данных Sign Language MNIST

Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset

https://orcid.org/0000-0002-5400-6817

Бобырь

М. В.

Bobyr

M. V.

Бобырь Максим Владимирович - доктор технических наук, профессор кафедры программной инженерии.

ул. 50 лет Октября, д. 94, Курск 305040

Researcher ID G-2604-2013

Maxim V. Bobyr - Dr. of Sci. (Engineering), Professor of the Software Engineering Department, Southwest State University.

50 Let Oktyabrya str. 94, Kursk 305040

Researcher ID G-2604-2013

fregat_mn@rambler.ru

https://orcid.org/0009-0007-8271-7660

Асеев

А. А.

Aseev

A. A.

Асеев Артем Андреевич - аспирант кафедры программной инженерии.

ул. 50 лет Октября, д. 94, Курск 305040

Artem A. Aseev - Post-Graduate Student of the Software Engineering Department, Southwest State University.

50 Let Oktyabrya str. 94, Kursk 305040

aseeff.artem@yandex.ru

Юго-Западный государственный университетSouthwest State University

2025

29112025

2938698

2025

Бобырь М.В., Асеев А.А.

Bobyr M.V., Aseev A.A.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://izvestswsu.elpub.ru/jour/article/view/1499

Цель исследования

Цель исследования. Задача распознавания жестов в системах компьютерного зрения имеет важное значение для разработки доступных интерфейсов взаимодействия человека с компьютером, в том числе и для людей с ограниченными возможностями. Традиционные методы, например использование ручного выделения признаков (HOG, SIFT) в сочетании с классификаторами типа SVM, обладают ограниченной точностью и чувствительны к изменениям освещения, фона и позы руки. Целью данной работы является построение и обучение сверточной нейронной сети (CNN) для эффективной классификации жестов на основе набора данных Sign Language MNIST. В рамках исследования решались задачи предобработки данных, проектирования архитектуры модели, её обучения и оценки качества распознавания на тестовом наборе.

Методы

Методы. Использовались библиотеки TensorFlow и Keras для реализации CNN. Модель включает сверточ-ные слои для извлечения локальных признаков, слой Flatten для векторизации, полносвязные слои с функ-цией активации ReLU и выходной слой с Softmax. Обучение проводилось с использованием оптимизатора Adam и функции потерь sparse_categorical_crossentropy на 27 455 изображениях, тестирование — на 7 172 примерах.

Результаты

Результаты. Предложенная модель достигла точности 89,14 % на тестовом наборе данных после 18 эпох обучения, что превосходит результаты традиционных методов (HOG + SVM – 70,1 %) и простых нейронных сетей (78,4 %).

Заключение

Заключение. Применение сверточных нейронных сетей для классификации жестов является эффективным подходом, обеспечивающим высокую точность и устойчивость к вариациям входных данных, что делает его перспективным для задач компьютерного зрения и разработки систем жестового взаимодействия.

Relevance

Relevance. Gesture recognition in computer vision systems is important for the development of accessible human-computer interaction interfaces, including for people with disabilities. Traditional methods, such as manual feature extraction (HOG, SIFT) in combination with SVM classifiers, have limited accuracy and are sensitive to changes in lighting, background, and hand pose.

Purpose of research

Purpose of research. The aim of this work is to build and train a convolutional neural network (CNN) for efficient gesture classification based on the Sign Language MNIST dataset. The study addressed the problems of data preprocessing, model architecture design, training, and recognition quality assessment on the test set.

Methods

Methods. TensorFlow and Keras libraries were used to implement the CNN. The model includes convolutional layers for local feature extraction, a Flatten layer for vectorization, fully connected layers with a ReLU activation function, and an output layer with Softmax. The training was performed using the Adam optimizer and the sparse_categorical_crossentropy loss function on 27,455 images, and testing was performed on 7,172 examples.

Results

Results. The proposed model achieved 89.14% accuracy on the test dataset after 18 training epochs, which outperforms traditional methods (HOG + SVM - 70.1%) and simple neural networks (78.4%).

Conclusion

Conclusion. The use of convolutional neural networks for gesture classification is an effective approach that provides high accuracy and is robust to variations in input data, making it promising for computer vision and gesture interaction systems.

нейронная сетьсверточная нейронная сетьполносвязный слойфункция активациифункция потерьSign Language MNISTCPUGPU

neural networkconvolutional neural networkfully connected layeractivation functionloss functionSign Language MNISTCPUGPU

References1

Gradient-based learning applied to document recognition / Y. LeCun, L. Bottou, Y. Bengio, P. Haffner // Proceedings of the IEEE. 1998. № 86(11). P. 2278–2324.

LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998; (86): 2278-2324.

Krizhevsky A., Sutskever I., Hinton G. ImageNet Classification with Deep Convolutional Neural Networks // Advances in Neural Information Processing Systems. 2012. Vol. 25. P. 1097–1105.

Krizhevsky A., Sutskever I., Hinton G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2012; 25: 1097-1105.

Воронцов К. В. Машинное обучение и анализ данных // Труды международной научной конференции "Нейроинформатика". М.: МФТИ, 2020. 452 с.

Vorontsov K. V. Machine learning and data analysis. In: Trudy mezhdunarodnoi nauchnoi konferentsii "Neiroinformatika" = Proceedings of the international scientific conference "Neuroinformatics". Moscow; 2020. 452 p. (In Russ.).

Петров И. В., Смирнов А. А. Применение сверточных нейронных сетей для классификации изображений в задачах компьютерного зрения // Искусственный интеллект и принятие решений. 2021. № 2. С. 45-58.

Petrov I. V., Smirnov A. A. Application of convolutional neural networks for image classification in computer vision problems. Iskusstvennyi intellekt i prinyatie reshenii = Artificial Intelligence and Decision Making. 2021; (2): 45-58. (In Russ.).

Китенко А. М. Метод поиска и разметки артефактов на изображениях с использованием алгоритмов детекции и сегментации // Системы анализа и обработки данных. 2021. № 4(84). С. 7-18.

Kitenko A. M. Method for searching and marking artifacts in images using detection and segmentation algorithms. Sistemy analiza i obrabotki dannykh = Data analysis and processing systems. 2021; (4): 7-18. (In Russ.).

Robust Hand Gesture Recognition Using HOG-9ULBP Features and SVM Model / J. Li, C. Li, J. Han, et al. // Electronics. 2022. Vol. 11(7). P. 988.

Li J., Li C., Han J., et al. Robust Hand Gesture Recognition Using HOG-9ULBP Features and SVM Model. Electronics. 2022; 11 (7): 988.

Козлов С. В., Иванова Е. П. Сравнительный анализ архитектур глубоких нейронных сетей для распознавания образов // Программные продукты и системы. 2022. № 3. С. 28-36.

Kozlov S. V., Ivanova E. P. Comparative analysis of deep neural network architectures for pattern recognition. Programmnye produkty i sistemy = Software products and systems. 2022: (3): 28-36. (In Russ.).

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition // International Conference on Learning Representations (ICLR). 2015. arXiv:1409.1556.

Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR). 2015; arXiv:1409.1556.

Kumar R., Patel S., Sharma M. Enhancing Sign Language Detection through MediaPipe and Convolutional Neural Networks // arXiv preprint. 2024. arXiv:2406.03729v1.

Kumar R., Patel S., Sharma M. Enhancing Sign Language Detection through MediaPipe and Convolutional Neural Networks. arXiv preprint, 2024, arXiv:2406.03729v1.

Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift // Proceedings of the 32nd International Conference on Machine Learning (ICML). 2015. P. 448-456.

Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. P. 448-456.

Семенов Д. А., Кузнецов М. И. Оптимизация процесса обучения сверточных нейронных сетей с использованием адаптивных алгоритмов // Информационные технологии. 2023. Т. 29, № 4. С. 195-203.

Semenov D. A., Kuznetsov M. I. Optimization of the Training Process of Convolutional Neural Networks Using Adaptive Algorithms. Informatsionnye tekhnologii = Information Technologies. 2023; 29(4): 195-203. (In Russ.).

Nair V., Hinton G. E. Rectified Linear Units Improve Restricted Boltzmann Machines // Proceedings of the 27th International Conference on Machine Learning (ICML). 2010. P. 807-814.

Nair V., Hinton G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. P. 807-814.

Deep Residual Learning for Image Recognition / K. He, X. Zhang, S. Ren, J. Sun // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. P. 770-778.

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. P. 770-778.

Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization // International Conference on Learning Representations (ICLR). 2015. arXiv:1412.6980.

Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015, arXiv:1412.6980.

Dropout: A Simple Way to Prevent Neural Networks from Overfitting / N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov // Journal of Machine Learning Research. 2014. Vol. 15, № 1. P. 1929-1958.

Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research. 2014; 15(1): 1929-1958.

Going deeper with convolutions / C. Szegedy, W. Liu, Y. Jia, et al. // IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 1-9.

Szegedy C., Liu W., Jia Y., et al. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 1-9.

Фаворская М. Н., Пахирка А. И. Построение карт глубины при обнаружении презентационных атак в системах распознавания лиц // Информационные и математические технологии в науке и управлении. 2022. № 3(27). С. 40-48.

Favorskaya M. N., Pakhirka A. I. Construction of depth maps for detection of presentation attacks in face recognition systems. Informatsionnye i matematicheskie tekhnologii v nauke i upravlenii = Information and mathematical technologies in science and management. 2022; (3): 40-48. (In Russ.).

Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation / N. C. Camgoz, O. Koller, S. Hadfield, R. Bowden // IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. P. 10023-10033.

Camgoz N. C., Koller O., Hadfield S., Bowden R. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. P. 10023-10033.

Исследование устройства нечеткого цифрового фильтра для робота-манипулятора / М.В. Бобырь, Н.А. Милостная, В.А. Булатников, М.Ю. Лунева // Известия Юго-Западного государственного университета. 2020. T. 24, №1. С. 115-129. https:// doi.org/10.21869/2223-1560-2020-24-1-115-129

Bobyr M. V., Milostnaya N. A., Bulatnikov V. A, Luneva М. Yu. Fuzzy Digital Filter Device Study for the Robot Manipulator. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2020; 24(1): 115-129 (In Russ.). https://doi.org/10.21869/2223-1560-2020-24-1-115-129

Бобырь М. В., Нассер А. А., Абдулджаббар М. А. Исследование свойств мягкого алгоритма нечетко-логического вывода // Известия Юго-Западного государственного университета. 2016. № 1. С. 31-49.

Bobyr. M. V., Nasser A. A., Abduljabbar M. A. Study of the properties of a soft algorithm for fuzzy-logical inference. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2016; (1): 31-49. (In Russ.).

The authors declare that there are no conflicts of interest present.