Concept Towards Segmenting Arm Areas for Robot-Based Dermatological In Vivo Measurements
Keywords:Object Detection, Convolutional Neural Networks, RGB-D images
Dermatological in vivo measurements are used for various purposes, e.g. health care, development and testing of skin care products or claim support in marketing. Especially for the last two purposes, in vivo measurements are extensive due to the quantity and repeatability of the measurement series. Furthermore, they are performed manually and therefore represent a nonnegligible time and cost factor. A solution to this is the implementation of collaborative robotics for the measurement execution. Due to various body shapes and surface conditions, common static control procedures are not applicable. To solve this problem, spatial information obtained from a stereoscopic camera can be integrated into the robot control process. However, the designated measurement area has to be detected and the spatial information processed. Therefore the authors propose a concept towards segmenting arm areas through a CNN-based object detector and their further processing to perform robot-based in vivo measurements. The paper gives an overview of the utilization of RGB-D images in 2D object detectors and describes the selection of a suitable model for the application. Furthermore the creation, annotation and augmentation of a custom dataset is presented.
M. Szymanski, R. van de Sand, O. Rieckmann, and A. Stolpmann, “Robotergestützte dermatologische in-vivo-Messungen,” atp magazin, vol. 62, no. 11-12, pp. 78–85, 2020.
K. Zhou, A. Paiement, and M. Mirmehdi, “Detecting humans in RGB-D data with CNNs,” in Proceedings of the fifteenth IAPR International Conference on Machine Vision Applications, Piscataway, NJ: IEEE, 2017, pp. 306–309.
Y. Xing, J. Wang, X. Chen, and G. Zeng, “2.5D convolution for RGB-D semantic segmentation,” in 2019 IEEE International Conference on Image Processing, Piscataway, NJ: IEEE, 2019, pp. 1410–1414.
P. Soviany and R. T. Ionescu, “Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction,” in SYNASC 2018,
[Los Alamitos, Calif.]: IEEE Computer Society, 2018? Pp. 209–214.
X. Wang, P. Cheng, X. Liu, and B. Uzochukwu, “Focal loss dense detector for vehicle
surveillance,” in 2018 International Conference on Intelligent Systems and Computer Vision (ISCV2018), Piscataway, NJ: IEEE, 2018, pp. 1–5.
Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, Light-head R-CNN: In defense of two-stage object detector.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 2014, pp. 580–587.
R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision, Piscataway, NJ: IEEE, 2015, pp. 1440–1448.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE transactions on pattern analysis and
machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in 29th IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2016, pp. 779–788.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Computer Vision – ECCV 2016, ser. Lecture Notes in Computer Science, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., vol. 9905, Cham: Springer International Publishing, 2016, pp. 21–37.
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.
J. Meng, Y. Gao, X. Wang, T. Lin, and J. Zhang, “Face recognition based on local binary patterns with threshold,” in IEEE International Conference on Granular Computing (GrC), 2010, X. Hu, Ed., Piscataway, NJ: IEEE, 2010, pp. 352–356.
T. Moranduzzo and F. Melgani, “Detecting cars in UAV images with a catalog-based approach,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 10, pp. 6356–6367, 2014.
O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. V. Jawahar, “Cats and dogs,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, Piscataway, NJ: IEEE, 2012, pp. 3498–3505.
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in CVPR 2005, C. Schmid, C. Tomasi, and S. Soatto, Eds., Los Alamitos, Calif: IEEE Computer
Society, 2005, pp. 886–893.
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in CVPR 2001, Los Alamitos, Calif: IEEE Computer Society, 2001, pp. I-511-I–518.
O. Chapelle, P. Haffner, and V. N. Vapnik, “Support vector machines for histogram-based image classification,” IEEE transactions on neural networks, vol. 10, no. 5, pp. 1055–1064, 1999.
R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ: IEEE, 2015, pp. 437–446.
Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, Object detection with deep learning: A review.  A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, A survey of the recent achitectures of deep convolutional neural networks, 2020.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
D. Steinkraus, I. Buck, and P. Y. Simard, “Using GPUs for machine learning algorithms,” in Proceedings / Eighth International Conference on Document Analysis and Recognition, 2005, Los Alamitos, Calif.: IEEE Computer Society, 2005, 1115–1120 Vol. 2.
K.-S. Oh and K. Jung, “GPU implementation of neural networks,” Pattern Recognition, vol. 37, no. 6, pp. 1311–1314, 2004.
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition.
Q. Dai, J. Qiao, F. Liu, X. Shi, and H. Yang, “A human body part segmentation method based on markov random field,” in International Conference on Control Engineering and Communication Technology (ICCECT), 2012, Piscataway, NJ: IEEE, 2012, pp. 149–152.
A. Jalal, A. Nadeem, and S. Bobasu, “Human body parts estimation and detection for physical sports movements,” in 2019 2nd International Conference on Communication, Computing and Digital Systems (C-CODE), Piscataway, NJ: IEEE, 2019, pp. 104–109.
Z. Ren, J. Yuan, J. Meng, and Z. Zhang, “Robust part-based hand gesture recognition using kinect sensor,” IEEE Transactions on Multimedia, vol. 15, no. 5, pp. 1110–1120, 2013.
C. Plagemann, V. Ganapathi, D. Koller, and S. Thrun, “Real-time identification and localization of body parts from depth images,” in IEEE International Conference on Robotics and Automation (ICRA), 2010, Piscataway, NJ: IEEE, 2010, pp. 3108–3113.
N. Mohsin and S. Payandeh, “Localization and identification of body extremities based on data from multiple depth sensors,” in 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Piscataway, NJ: IEEE, 2017, pp. 2736–2741.
S. Chandra, S. Tsogkas, and I. Kokkinos, “Accurate human-limb segmentation in RGB-D images for intelligent mobility assistance robots,” in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE, 7.12.2015 - 13.12.2015, pp. 436–442.
S. Gupta, R. Girshick, P. Arbel´aez, and J. Malik, “Learning rich features from RGB-D images for object detection and segmentation,” in Computer vision - ECCV 2014, ser. Lecture Notes in Computer Science, D. Fleet, Ed., vol. 8695, Cham: Springer, 2014, pp. 345–360.
M. Takahashi, Y. Ji, K. Umeda, and A. Moro, “Expandable YOLO: 3D object detection from RGB-D images,” in ”2020 21st International Conference on Research and Education in Mechatronics (REM)”, IEEE, 2021-01-12, pp. 1–5.
S. Song and J. Xiao, “Deep sliding shapes for amodal 3D object detection in RGB-D images,” in 29th IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2016, pp. 808–816.
D. Xu, D. Anguelov, and A. Jain, “Pointfusion: Deep sensor fusion for 3D bounding box estimation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2018, pp. 244–253.
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE, 2019, pp. 658–666.
K. Greff, A. Brand˜ao, S. Krauß, D. Stricker, and E. Clua, “A comparison between background subtraction algorithms using a consumer depth camera,” in Proceedings of
the International Conference on Computer Vision Theory and Applications, SciTePress - Science and and Technology Publications, 24.02.2012 - 26.02.2012, pp. 431–436.
E. J. Fernandez-Sanchez, J. Diaz, and E. Ros, “Background subtraction based on color and depth using active sensors,” Sensors (Basel, Switzerland), vol. 13, no. 7, pp. 895–8915, 2013.
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2,
pp. 154–171, 2013.
C. Yuanzhouhan, S. Chunhua, and T. S. Heng, “Exploiting depth from single monocular images for object detection and semantic segmentation,” IEEE transactions on image
processing : a publication of the IEEE Signal Processing Society, vol. 26, no. 2, pp. 836–846, 2017.
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
X. Ren, S. Du, and Y. Zheng, “Parallel RCNN: A deep learning method for people detection using RGB-D images,” in CISP-BMEI 2017, Q. Li, Ed., Piscataway, NJ: IEEE, 2017, pp. 1–6.
T. Ophoff, K. van Beeck, and T. Goedem´ e, “Exploring RGB+depth fusion for real-time object detection,” Sensors (Basel, Switzerland), vol. 19, no. 4, 2019.
P. Sharma and D. Valles, “Backbone neural network design of single shot detector from RGB-D images for object detection,” in 2020 11th IEEE Annual Ubiquitous Computing,
Electronics & Mobile Communication Conference (UEMCON), IEEE, 10/28/2020 - 10/31/2020, pp. 0112–0117.
How to Cite
Copyright (c) 2021 Mateusz Szymanski, Ron van de Sand, Esther Tauscher, Olaf Rieckmann, Alexander Stolpmann
This work is licensed under a Creative Commons Attribution 4.0 International License.