Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset

M. V. Bobyr; A. A. Aseev

doi:10.21869/2223-1560-2025-29-3-86-98

Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset

M. V. Bobyr, A. A. Aseev

https://doi.org/10.21869/2223-1560-2025-29-3-86-98

Full Text:

PDF (Rus)

Generate QR code

Abstract

Relevance. Gesture recognition in computer vision systems is important for the development of accessible human-computer interaction interfaces, including for people with disabilities. Traditional methods, such as manual feature extraction (HOG, SIFT) in combination with SVM classifiers, have limited accuracy and are sensitive to changes in lighting, background, and hand pose.

Purpose of research. The aim of this work is to build and train a convolutional neural network (CNN) for efficient gesture classification based on the Sign Language MNIST dataset. The study addressed the problems of data preprocessing, model architecture design, training, and recognition quality assessment on the test set.

Methods. TensorFlow and Keras libraries were used to implement the CNN. The model includes convolutional layers for local feature extraction, a Flatten layer for vectorization, fully connected layers with a ReLU activation function, and an output layer with Softmax. The training was performed using the Adam optimizer and the sparse_categorical_crossentropy loss function on 27,455 images, and testing was performed on 7,172 examples.

Results. The proposed model achieved 89.14% accuracy on the test dataset after 18 training epochs, which outperforms traditional methods (HOG + SVM - 70.1%) and simple neural networks (78.4%).

Conclusion. The use of convolutional neural networks for gesture classification is an effective approach that provides high accuracy and is robust to variations in input data, making it promising for computer vision and gesture interaction systems.

Keywords

neural network, convolutional neural network, fully connected layer, activation function, loss function, Sign Language MNIST, CPU, GPU

About the Authors

M. V. Bobyr

Southwest State University
Russian Federation

Maxim V. Bobyr - Dr. of Sci. (Engineering), Professor of the Software Engineering Department, Southwest State University.

50 Let Oktyabrya str. 94, Kursk 305040

Researcher ID G-2604-2013

Competing Interests:

None

A. A. Aseev

Southwest State University
Russian Federation

Artem A. Aseev - Post-Graduate Student of the Software Engineering Department, Southwest State University.

50 Let Oktyabrya str. 94, Kursk 305040

Competing Interests:

None

References

1. LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998; (86): 2278-2324.

2. Krizhevsky A., Sutskever I., Hinton G. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2012; 25: 1097-1105.

3. Vorontsov K. V. Machine learning and data analysis. In: Trudy mezhdunarodnoi nauchnoi konferentsii "Neiroinformatika" = Proceedings of the international scientific conference "Neuroinformatics". Moscow; 2020. 452 p. (In Russ.).

4. Petrov I. V., Smirnov A. A. Application of convolutional neural networks for image classification in computer vision problems. Iskusstvennyi intellekt i prinyatie reshenii = Artificial Intelligence and Decision Making. 2021; (2): 45-58. (In Russ.).

5. Kitenko A. M. Method for searching and marking artifacts in images using detection and segmentation algorithms. Sistemy analiza i obrabotki dannykh = Data analysis and processing systems. 2021; (4): 7-18. (In Russ.).

6. Li J., Li C., Han J., et al. Robust Hand Gesture Recognition Using HOG-9ULBP Features and SVM Model. Electronics. 2022; 11 (7): 988.

7. Kozlov S. V., Ivanova E. P. Comparative analysis of deep neural network architectures for pattern recognition. Programmnye produkty i sistemy = Software products and systems. 2022: (3): 28-36. (In Russ.).

8. Simonyan K., Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICLR). 2015; arXiv:1409.1556.

9. Kumar R., Patel S., Sharma M. Enhancing Sign Language Detection through MediaPipe and Convolutional Neural Networks. arXiv preprint, 2024, arXiv:2406.03729v1.

10. Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. P. 448-456.

11. Semenov D. A., Kuznetsov M. I. Optimization of the Training Process of Convolutional Neural Networks Using Adaptive Algorithms. Informatsionnye tekhnologii = Information Technologies. 2023; 29(4): 195-203. (In Russ.).

12. Nair V., Hinton G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. P. 807-814.

13. He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. P. 770-778.

14. Kingma D. P., Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015, arXiv:1412.6980.

15. Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research. 2014; 15(1): 1929-1958.

16. Szegedy C., Liu W., Jia Y., et al. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015. P. 1-9.

17. Favorskaya M. N., Pakhirka A. I. Construction of depth maps for detection of presentation attacks in face recognition systems. Informatsionnye i matematicheskie tekhnologii v nauke i upravlenii = Information and mathematical technologies in science and management. 2022; (3): 40-48. (In Russ.).

18. Camgoz N. C., Koller O., Hadfield S., Bowden R. Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. P. 10023-10033.

19. Bobyr M. V., Milostnaya N. A., Bulatnikov V. A, Luneva М. Yu. Fuzzy Digital Filter Device Study for the Robot Manipulator. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2020; 24(1): 115-129 (In Russ.). https://doi.org/10.21869/2223-1560-2020-24-1-115-129

20. Bobyr. M. V., Nasser A. A., Abduljabbar M. A. Study of the properties of a soft algorithm for fuzzy-logical inference. Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta = Proceedings of the Southwest State University. 2016; (1): 31-49. (In Russ.).

Review

For citations:

Bobyr M.V., Aseev A.A. Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset. Proceedings of the Southwest State University. 2025;29(3):86-98. (In Russ.) https://doi.org/10.21869/2223-1560-2025-29-3-86-98

JATS XML

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2223-1560 (Print)
ISSN 2686-6757 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Southwest State University

Applying deep learning convolutional neural network to classify gestures from MNIST Sign Language dataset

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy