Hand Detection and Segmentation in First Person Image Using Mask R-CNN

  • Huy Sinh Nguyen
  • Hai Vu
Keywords: First person image, Hand Segmentation, Transfer learning, Mask R-CNN

Abstract

In this work, we propose a technique to automatically detect and segment hands on first-person images of patients
in upper limb rehabilitation exercises. The aim is to automate the assessment of the patient's recovery process
through rehabilitation exercises. The proposed technique includes the following steps: 1) setting up a wearable
camera system and collecting upper extremity rehabilitation exercise data. The data is filtered, selected and
annotated with the left and right hand as well as segmented the image area of the patient's hand. The dataset
consists of 3700 images with the name RehabHand. This dataset is used to train hand detection and segmentation
models on first-person images. 2) conducted a survey of automatic hand detection and segmentation models using
Mask-RCNN network architecture with different backbones. From the experimental architectures, the Mask -
RCNN architecture with the Res2Net backbone was selected for all three tasks: hand detection; left - right hand
identification; and hand segmentation. The proposed model has achieved the highest performance in the tests. To
overcome the limitation on the amount of training data, we propose to use the transfer learning method along
with data enhancement techniques to improve the accuracy of the model. The results of the detection of objects on
the test dataset for the left hand is AP = 92.3%, the right hand AP = 91.1%. The segmentation result on the test dataset for
left hand is AP = 88.8%, right hand being AP = 87%. These results suggest that it is possible to automatically quantify
the patient's ability to use their hands during upper extremity rehabilitation.

References

[1]. S. Bambach, S. Lee, D. Crandall, and C. Yu. Lending “Detecting hands and recognizing activities in
complex egocentric interactions”. In IEEE International Conference on Computer Vision (ICCV). 2015.
[2]. Urooj, Aisha, and Ali Borji. "Analysis of hand segmentation in the wild." Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition. 2018.
[3]. Likitlersuang, Jirapat, et al. "Egocentric video: a new tool for capturing hand use of individuals with spinal
cord injury at home." Journal of neuroengineering and rehabilitation 16.1 (2019): p.83.
[4]. He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision.
2017.
[5]. Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European conference on computer
vision. Springer, Cham, 2014.
[6]. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2016.
[7]. Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017.
[8]. Gao, Shanghua, et al. "Res2net: A new multi-scale backbone architecture." IEEE transactions on pattern
analysis and machine intelligence (2019).
[9]. Wang, Jingdong, et al. "Deep high-resolution representation learning for visual recognition." IEEE
transactions on pattern analysis and machine intelligence (2020).
[10]. Radosavovic, Ilija, et al. "Designing network design spaces." Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. 2020.
[11] Andrea Bandini, José Zariffa, “Analysis of the hands in egocentric vision: A survey”, IEEE Transaction
on Pattern Analysis and Machine Interaction, https://doi.org/10.1109/TPAMI.2020.2986648, 2020 (Early
Access).
[12] X. Ren and M. Philipose, “Egocentric recognition of handled objects: Benchmark and analysis,” in
Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society
Conference on. IEEE, 2009, pp. 1–8.
[13] D. Castro, S. Hickson, V. Bettadapura, E. Thomaz, G. Abowd, H. Christensen, I.Essa, "Predicting Daily
Activities from Egocentric Images Using Deep Learning, in Proceedings of the 2015 ACM International
Symposium on Wearable Computers.
[14]. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference
on computer vision and pattern recognition. 2016.
[15]. Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017.
[16]. Wang, Jingdong, et al. "Deep high-resolution representation learning for visual recognition." IEEE
transactions on pattern analysis and machine intelligence (2020).
[17]. Radosavovic, Ilija, et al. "Designing network design spaces." Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. 2020.
[18], Shaoqing Ren, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks”, In the Proceedings of NIPS, 2015
[19]. Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoderdecoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence
39.12 (2017): 2481-2495.
[20]. Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical
image segmentation." International Conference on Medical image computing and computer-assisted
intervention. Springer, Cham, 2015.
[21]. Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017.
[22]. Yifan Zhang, Congqi Cao, Jian Cheng, Hanqing Lu, “EgoGesture: A New Dataset and Benchmark for
Egocentric Hand Gesture Recognition”, IEEE Transactions on Multimedia, Volume 20, Issue 5, May 2018.
[23]. Bambach, Sven and Lee, Stefan and Crandall, David J. and Yu, Chen, “Lending A Hand: Detecting Hands
and Recognizing Activities in Complex Egocentric Interactions”, in the Proceeding of The IEEE International
Conference on Computer Vision (ICCV), 2015.
[24]. Alireza Fathi, Xiaofeng Ren, James M. Rehg, Learning to Recognize Objects in Egocentric Activities, In
the proceeding of Computer Vision and Pattern Recognition (CVPR), 2011
Published
2022-03-08