An Evaluation of Pose Estimation in Video of Traditional Martial Arts Presentation

  • Nguyễn Tường Thành School of Electronics and Telecommunications, Hanoi University Science and Technology
  • Lê Văn Hùng Tan Trao University
  • Phạm Thành Công School of Electronics and Telecommunications, Hanoi University Science and Technology
Keywords: Estimation of keypoints, pose estimation, deep learning, skeleton, conserving and teaching traditional martial arts

Abstract

Preserving, maintaining, and teaching traditional martial arts are very important activities in social life. That helps individuals preserve national culture, exercise, and practice self-defense. However, traditional martial arts have many differentposturesaswellasvariedmovementsofthebodyand body parts. The problem of estimating the actions of human body still has many challenges, such as accuracy, obscurity, and so forth. This paper begins with a review of several methods of 2-D human pose estimation on the RGB images, in which the methods of using the Convolutional Neural Network (CNN) models have outstanding advantages in terms of processing time and accuracy. In this work we built a small dataset and used CNN for estimating keypoints and joints of actions in traditional martial arts videos. Next we applied the measurements (length of joints, deviation angle of joints, and deviation of keypoints) for evaluating pose estimation in 2-D and 3-D spaces. The estimator was trained on the classic MSCOCO Keypoints Challenge dataset, the results were evaluated on a well-known dataset of Martial Arts, Dancing, and Sports dataset. The results were quantitatively evaluated and reported in this paper.

Author Biographies

Nguyễn Tường Thành, School of Electronics and Telecommunications, Hanoi University Science and Technology

Nguyen Tuong Thanh received B.E. degree from Hanoi University Science and Technology in 2002 in Electronics and Telecommunications; He received M.E. degree in Electronic Engineering, University of Transport and Communications. He is now PhD student in Electronic Engineering, Hanoi University of Science and Technology. Currently, he is working at the Faculty of Engineering and Technology, Quy Nhon University. His research interests include computer vision, image processing 2-D, 3-D machine leaning, deep learning.

Lê Văn Hùng, Tan Trao University

Le Van Hung received M.Sc. degree at Faculty Information Technology- Hanoi National University of Education (2013). He received PhD degree at International Research Institute MICA HUSTCNRS/UMI - 2954 - INP Grenoble (2018). Currently, he is a lecture of Tan Trao University. His research interests include Computer vision, RANSAC and RANSAC variation and 3-D object detection, recognition, machine leaning, deep learning.

Phạm Thành Công, School of Electronics and Telecommunications, Hanoi University Science and Technology

Pham Thanh Cong received M.Sc. degree at Electronics and Telecommunications, Hanoi University of Technology in 1998. He received PhD degree at Electronics and Telecommunications, Turin Polytechnic University, Italy in 2010. Currently, he is a lecturer of institute of Electronics and Telecommunications, Hanoi University of Science and Technology. His research interests include Super high frequency technology, antennas, telecommunication systems.

References

W. Zhang, Z. Liu, L. Zhou, H. Leung, and A. B. Chan, “Martial

Arts, Dancing and Sports dataset: a Challenging Stereo and Multi-View Dataset for 3D Human Pose Estimation,” Image and Vision Computing, vol. Volume 61, 2017.

W. Gong, X. Zhang, J. Gonzalez, A. Sobral, T. Bouwmans, C. Tu, and E. H. Zahzah, “Human Pose Estimation from Monocular Images: A Comprehensive Survey,” Sensors (Basel, Switzerland), vol. 16, no. 12, pp. 1–39, 2016.

M. Rantz, T. Banerjee, E. Cattoor, S. Scott, M. Skubic, and M. Popescu, “Automated fall detection with quality improvement ”rewind” to reduce falls in hospital rooms,” J Gerontol Nurs, vol. 40, no. 1, pp. 13–17, 2014. [Online]. Available: https://doi.org/10.1038/s41467-018-07656-2

R. IgualCarlos, M. Carlos, and I. Plaza, “Challenges, Issues and Trends in Fall Detection Systems,” BioMedical Engineering OnLine, vol. 12, no. 1, p. 147–158, 2013.

J. Kramer, M. Parker, D. Castro, N. Burrus, and F. Echtler, “Hacking the Kinect,” Apress, 2012.

P. Felzenszwalb, D. Mcallester, and D. Ramanan, “A Discriminatively Trained, Multiscale, Deformable Part Model,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.

M. Sun and S. Savarese, “Articulated part-based model for joint object detection and pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision, 2011, pp. 723–730.

E. M. Berti, A. J. S. Salmeron, and C. R. Viala, “4-Dimensional deformation part model for pose estimation using Kalman filter constraints,” International Journal of Advanced Robotic Systems, vol. 14, no. 3, pp. 1–13, 2017.

U. Rafi, J. Gall, and B. Leibe, “A semantic occlusion model for human pose estimation from a single depth image,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015.

I. Sarandi, T. Linder, K. O. Arras, and B. Leibe, “How Robust is

D Human Pose Estimation to Occlusion,” in IEEE/RSJ International

Conference on Intelligent Robots and Systems (IROS’18) - Workshop on Robotic Co-workers 4.0: Human Safety and Comfort in Human-Robot Interactive Social Environments, 2018.

R. Girshick, “Fast R-CNN,” in International Conference on Computer Vision, 2015.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems 28, 2015, pp. 91–99.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Computer Vision and Pattern Recognition, 2016.

J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Computer Vision and Pattern Recognition, 2017.

D. Osokin, “Real-time 2D Multi-Person Pose Estimation on CPU:

Lightweight OpenPose,” Published in ArXiv, 2018.

K. Brown, “Stereo Human Keypoint Estimation,” Stanford University.

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran, “Detectand-Track : Efficient Pose Estimation in Videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 350–359.

openpose, “openpose,” https://github.com/

CMU-Perceptual-Computing-Lab/openpose, 2019, [Accessed 23

April 2019].

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity field,” 2017.

B. Nicolas, “Calibrating the depth and color camera,” http://nicolas.burrus.name/index.php/Research/KinectCalibration, 2018, [Online; accessed 10-January-2018].

COCO, “Observations on the calculations of COCO metrics,” https://github.com/cocodataset/cocoapi/issues/56, 2019, [Accessed 24 April 2019].

T. B. Dinh, “Bao ton va phat huy vo co truyen Binh dinh: Tiep tuc ho tro cac vo duong tieu bieu,” http://www.baobinhdinh.com.vn/viewer.

aspx?macm=12&macmp=12&mabb=88043, 2017, [Accessed; April, 4 2019].

——, “Ai ve Binh Dinh ma coi, Con gai Binh Dinh bo roi di quyen,” http://www.seagullhotel.com.vn/du-lich-binh-dinh/

vo-co-truyen-binh-dinh-5, 2019, [Accessed; April, 4 2019].

Chinese, “Chinese Kung Fu (Martial Arts),” https://www.

travelchinaguide.com/intro/martial arts/, 2019, [Accessed; April, 4

.

L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele, “Strong Appearance and Expressive Spatial Models for Human Pose Estimation,” in Proceedings of the 2013 IEEE International Conference on Computer Vision, 2013.

M. Andriluka, S. Roth, and B. Schiele, “Pictorial Structures Revisited : People Detection and Articulated Pose Estimation,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1014–1021.

X. Tao and Z. Yun, “Fall prediction based on biomechanics equilibrium using Kinect,” International Journal of Distributed Sensor Networks, vol. 13, no. 4, 2017.

L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka,

P. Gehler, and B. Schiele, “Deepcut: Joint subset partition and labeling for multi person pose estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) year = 2016,.

ECCV2018, “ECCV 2018 Joint COCO and Mapillary Recognition),” http://cocodataset.org/#home, 2018, [Accessed 18 April 2019].

M. 2017, “MSCOCO Keypoints Challenge 2017),” https:

//places-coco2017.github.io/, 2017, [Accessed 18 April 2019].

A. Toshev and C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks,” in CVPR, 2014.

J. Tompson, R. Goroshin, A. Jain, Y. Lecun, and C. Bregler, “Efficient Object Localization Using Convolutional Networks,” in CVPR, 2015.

S.-e. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional Pose Machines,” in CVPR, 2016.

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele.

M. Andriluka, U. Iqbal, E. Insafutdinov, L. Pishchulin, and M. P. I. Informatics, “PoseTrack: A Benchmark for Human Pose Estimation and Tracking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in

ICCV, 2017.

J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” in In CVPR, 2017.

D. Tome, C. Russell, and L. Agapito, “Lifting from the deep: Convolutional 3D pose estimation from a single image,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR

, vol. 2017-Janua, 2017, pp. 5689–5698.

H.-s. Fang, Y. Xu, W. Wang, X. Liu, and S.-c. Zhu, “Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

I. Sarandi, T. Linder, K. O. Arras, and B. Leibe, “How robust is

d human pose estimation to occlusion?” CoRR, vol. abs/1808.09316,

[Online]. Available: http://arxiv.org/abs/1808.09316

N. Sarafianos, B. Boteanu, B. Ionescu, and I. A. Kakadiaris, “3D

Human Pose Estimation : A Review of the Literature and Analysis of

Covariates,” Computer Vision and Image Understanding, vol. Volume

, no. vii, pp. Pages 1–20, 2016.

T. B. Dinh, “Preserving traditional martial arts),” http://www.

baobinhdinh.com.vn/culture-sport/2011/8/114489/, 2011, [Accessed 18 April 2019].

Chinese, “traditional Chinese martial arts and the transmission of intangible cultural heritage),” https://www.academia.edu/18641528/

Fighting modernity traditional Chinese martial arts and the

transmission of intangible cultural heritage., 2012, [Accessed 18

April 2019].

Microsoft, “Kinect for Windows SDK v1.8,” https://www.microsoft.com/

en-us/download/details.aspx?id=40278, 2012, [Accessed 18 April 2019].

MICA, “International Research Institute MICA,” http://mica.edu.vn/,2019, [Accessed 19 April 2019].

Opencv, “Opencv library,” https://opencv.org/, 2018, [Accessed 19 April 2019].

Z. X, “A Study of Microsoft Kinect Calibration,” Technical report Dept. of Computer Science George Mason University, 2012.

J.-Y. B., “Camera calibration toolbox for matlab,” http://www.vision.caltech.edu/bouguetj/calib doc/., 2019, [Accessed 19 April 2019].

L. Sigal, A. O. Balan, and M. J. Black, “HUMANEVA: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion,” International Journal of Computer Vision, vol. Volume 87, no. 1, 2010.

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-Person Pose Estimation,” https://github.com/ZheC/Realtime Multi-Person Pose Estimation, [Accessed 23 April 2019].

——, “OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation,” https://github.com/CMU-Perceptual-Computing-Lab/openpose, [Accessed 23 April 2019].

Tensorflow, “tf-pose-estimation,” https://github.com/ildoonet/

tf-pose-estimation, [Accessed 23 April 2019].

H. Wang, W. P. An, X. Wang, L. Fang, and J. Yuan, “Magnifynet for multi-person 2d pose estimation,” in 2018 IEEE International

Conference on Multimedia and Expo (ICME), July 2018, pp. 1–6.

caffe2, “caffe2-pose-estimation,” https://github.com/eddieyi/

caffe2-pose-estimation, [Accessed 23 April 2019].

Chainer, “Chainer Realtime Multi-Person Pose Estimation,” https:

//github.com/DeNA/Chainer Realtime Multi-Person Pose Estimation,

[Accessed 23 April 2019].

mxnet, “Reimplementation of human keypoint detection in mxnet,”https://github.com/dragonfly90/mxnet Realtime Multi-Person Pose Estimation, [Accessed 23 April 2019].

MatConvNet, “MatConvNet Realtime Multi-Person Pose Estimation,” https://github.com/coocoky/matconvnet Realtime Multi-Person Pose Estimation, [Accessed 23 April 2019].

CNTK, “CNTK Realtime Multi-Person Pose Estimation,” https:

//github.com/Hzzone/CNTK Realtime Multi-Person Pose Estimation,

[Accessed 23 April 2019].

L. Bo and C. Sminchisescu, “Twin Gaussian Processes for Structured Prediction,” International Journal of Computer Vision, vol. 87, no. 1-2, 2010.

PCL, “How to use random sample consensus model,” http://pointclouds.org/documentation/tutorials/random sample consensus.php, 2014.

Published
2019-12-31
Section
Regular Articles