Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset
https://doi.org/10.21869/2223-1560-2025-29-3-86-98
Abstract
Relevance. Gesture recognition in computer vision systems is important for the development of accessible human-computer interaction interfaces, including for people with disabilities. Traditional methods, such as manual feature extraction (HOG, SIFT) in combination with SVM classifiers, have limited accuracy and are sensitive to changes in lighting, background, and hand pose.
Purpose of research. The aim of this work is to build and train a convolutional neural network (CNN) for efficient gesture classification based on the Sign Language MNIST dataset. The study addressed the problems of data preprocessing, model architecture design, training, and recognition quality assessment on the test set.
Methods. TensorFlow and Keras libraries were used to implement the CNN. The model includes convolutional layers for local feature extraction, a Flatten layer for vectorization, fully connected layers with a ReLU activation function, and an output layer with Softmax. The training was performed using the Adam optimizer and the sparse_categorical_crossentropy loss function on 27,455 images, and testing was performed on 7,172 examples.
Results. The proposed model achieved 89.14% accuracy on the test dataset after 18 training epochs, which outperforms traditional methods (HOG + SVM - 70.1%) and simple neural networks (78.4%).
Conclusion. The use of convolutional neural networks for gesture classification is an effective approach that provides high accuracy and is robust to variations in input data, making it promising for computer vision and gesture interaction systems.
About the Authors
M. V. BobyrRussian Federation
Maxim V. Bobyr - Dr. of Sci. (Engineering), Professor of the Software Engineering Department, Southwest State University.
50 Let Oktyabrya str. 94, Kursk 305040
Researcher ID G-2604-2013
Competing Interests:
None
A. A. Aseev
Russian Federation
Artem A. Aseev - Post-Graduate Student of the Software Engineering Department, Southwest State University.
50 Let Oktyabrya str. 94, Kursk 305040
Competing Interests:
None
References
1. LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998; (86): 2278-2324.
2. Krizhevsky A., Sutskever I., Hinton G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2012; 25: 1097-1105.
3. Vorontsov K. V. Machine learning and data analysis. In: Trudy mezhdunarodnoi nauchnoi konferentsii "Neiroinformatika" = Proceedings of the international scientific conference "Neuroinformatics". Moscow; 2020. 452 p. (In Russ.).
4. Petrov I. V., Smirnov A. A. Application of convolutional neural networks for image classification in computer vision problems. Iskusstvennyi intellekt i prinyatie reshenii = Artificial Intelligence and Decision Making. 2021; (2): 45-58. (In Russ.).
5. Kitenko A. M. Method for searching and marking artifacts in images using detection and segmentation algorithms. Sistemy analiza i obrabotki dannykh = Data analysis and processing systems. 2021; (4): 7-18. (In Russ.).
6. Li J., Li C., Han J., et al. Robust Hand Gesture Recognition Using HOG-9ULBP Features and SVM Model. Electronics. 2022; 11 (7): 988.
7. Kozlov S. V., Ivanova E. P. Comparative analysis of deep neural network architectures for pattern recognition. Programmnye produkty i sistemy = Software products and systems. 2022: (3): 28-36. (In Russ.).
8. Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR). 2015; arXiv:1409.1556.
9. Kumar R., Patel S., Sharma M. Enhancing Sign Language Detection through MediaPipe and Convolutional Neural Networks. arXiv preprint, 2024, arXiv:2406.03729v1.
10. Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. P. 448-456.
11. Semenov D. A., Kuznetsov M. I. Optimization of the Training Process of Convolutional Neural Networks Using Adaptive Algorithms. Informatsionnye tekhnologii = Information Technologies. 2023; 29(4): 195-203. (In Russ.).
12. Nair V., Hinton G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. P. 807-814.
13. He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. P. 770-778.
14. Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015, arXiv:1412.6980.
15. Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research. 2014; 15(1): 1929-1958.
16. Szegedy C., Liu W., Jia Y., et al. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 1-9.
17. Favorskaya M. N., Pakhirka A. I. Construction of depth maps for detection of presentation attacks in face recognition systems. Informatsionnye i matematicheskie tekhnologii v nauke i upravlenii = Information and mathematical technologies in science and management. 2022; (3): 40-48. (In Russ.).
18. Camgoz N. C., Koller O., Hadfield S., Bowden R. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. P. 10023-10033.
19. Bobyr M. V., Milostnaya N. A., Bulatnikov V. A, Luneva М. Yu. Fuzzy Digital Filter Device Study for the Robot Manipulator. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2020; 24(1): 115-129 (In Russ.). https://doi.org/10.21869/2223-1560-2020-24-1-115-129
20. Bobyr. M. V., Nasser A. A., Abduljabbar M. A. Study of the properties of a soft algorithm for fuzzy-logical inference. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2016; (1): 31-49. (In Russ.).
Review
For citations:
Bobyr M.V., Aseev A.A. Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset. Proceedings of the Southwest State University. 2025;29(3):86-98. (In Russ.) https://doi.org/10.21869/2223-1560-2025-29-3-86-98





















