Computer-Aided Depth Video Stream Masking Framework for Human Body Segmentation in Depth Sensor Images
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. Problem Statement
3.2. Suggested Framework
- Full manual—the user draws the mask on the image by hand. Useful in case all other modes fail to provide the desired results. In this case, the transformation is simply user input:
- Segmentation based on initial point—the user selects segmentation sensitivity b and a point that belongs to the desired object. Selected object is added to the mask and the process is repeated until the whole desired object is marked. In this case, the transformation is a union of autosegmented blobs of the image where initial points are provided by the user:
- Segmentation based on the previous frame—segmentation based on initial point is performed, but initial points are taken from the previous frame. This is only applicable if at least one frame is already segmented. In this case, the transformation is the same as in segmentation based on initial point, but the points are recalculated in case the object moved:
3.2.1. Noise Reduction
3.2.2. Internal Data Representation
3.2.3. Novel Clustering Algorithm
- Bounding box, which represents a rectangular box in 3D spaces, with properties minX, maxX, minY, maxY, minZ, maxZ representing the corners of the box;
- Point, which represents a point in 3D space;
- Search tree, which has depth (0—split by x coordinate, 1—split by y, 2—split by z), location (the point that the node holds), removed (internal property to prevent returning already returned points), left and right (child nodes).
Algorithm 1 Algorithm to find a cluster |
Require: |
Require: firstPoint ∈ pointCloud |
1: while true do |
2: cluster = createEmptyCluster |
3: currentPoints = [firstPoint] |
4: boundingBox = point.coordinates ± b |
5: newPointsAdded = true |
6: while newPointsAdded do |
7: closePoints = fcp(boundingBox,b,root) |
8: if closePoints.size = 0 then |
9: newPointsAdded = false |
10: end if |
11: currentPoints ∪= closePoints |
12: end while |
13: return cluster(currentPoints) |
14: end while |
15: procedure FCP(boundingBox,b,node,result=[]) |
16: if node.depth = 0 then |
17: currentLocation = node.location.x |
18: else |
19: if node.depth = 1 then |
20: currentLocation = node.location.y |
21: else |
22: currentLocation = node.location.z |
23: end if |
24: end if |
25: if !node.removed and boundingBox.contains(node) then |
26: result ∪= node |
27: node.removed = true |
28: boundingBox = expand(boundingBox, node, b) |
29: end if |
30: points ∪= fcp(boundingBox,b,node.left,result) |
31: points ∪= fcp(boundingBox,b,node.right,result) |
32: return points |
33: end procedure |
34: procedureexpand(boundingBox, node, b) |
35: boundingBox.minX = min(boundingBox.minX, node.x−b) |
36: boundingBox.maxX = min(boundingBox.maxX, node.x+b) |
37: boundingBox.minY = min(boundingBox.minY, node.y−b) |
38: boundingBox.maxY = min(boundingBox.maxY, node.y+b) |
39: boundingBox.minZ = min(boundingBox.minZ, node.z−b) |
40: boundingBox.maxZ = min(boundingBox.maxZ, node.z+b) |
41: end procedure |
3.2.4. Working with the Point Cloud Manually
3.3. Adapting the Solution to Video Streams
Working with Large Video Sequences
- Current point, consisting of 3 doubles (x, y, and z coordinates in 3D space) and 2 integers (x and y coordinates on the depth image) and a reference to it, a total of bytes;
- References to the left and right child nodes of the tree. Because the application is compiled for 64-bit platforms, the references use at least bytes;
- A Boolean that indicates whether the point has already been assigned to a cluster (1 byte);
- A Boolean that indicates whether the node has any child nodes that have not yet been assigned to a cluster (1 byte);
- Current dimension of the tree (an integer, 4 bytes).
3.4. Software Solution
4. Framework Evaluation
4.1. Dataset
4.2. Test Hardware and Software
4.3. Performance Results
- PCL Euclidean clustering—the original PCL algorithm that uses a radius search;
- Bounding box—an algorithm that uses a bounding box search;
- Expanding bounding box—an algorithm that uses a bounding box search and expands the bounding box during the tree search (implemented in the final software solution).
4.4. Fully Automatic Segmentation Accuracy Results
4.5. Manual Segmentation Time Cost Analysis
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, X. Deep learning in object recognition, detection, and segmentation. Found. Trends Signal Process. 2016, 8, 217–382. [Google Scholar] [CrossRef]
- Guzsvinecz, T.; Szucs, V.; Sik-Lanyi, C. Suitability of the kinect sensor and leap motion controller—A literature review. Sensors 2019, 19, 1072. [Google Scholar] [CrossRef] [Green Version]
- Shires, L.; Battersby, S.; Lewis, J.; Brown, D.; Sherkat, N.; Standen, P. Enhancing the tracking capabilities of the Microsoft Kinect for stroke rehabilitation. In Proceedings of the 2013 IEEE 2nd International Conference on Serious Games and Applications for Health (SeGAH), Vilamoura, Portugal, 2–3 May 2013. [Google Scholar] [CrossRef]
- Ibrahim, M.M.; Liu, Q.; Khan, R.; Yang, J.; Adeli, E.; Yang, Y. Depth map artefacts reduction: A review. IET Image Process. 2020, 14, 2630–2644. [Google Scholar] [CrossRef]
- Song, L.; Yu, G.; Yuan, J.; Liu, Z. Human pose estimation and its application to action recognition: A survey. J. Vis. Commun. Image Represent. 2021, 76, 103055. [Google Scholar] [CrossRef]
- Ingale, A.K.; Divya Udayan, J. Real-time 3D reconstruction techniques applied in dynamic scenes: A systematic literature review. Comput. Sci. Rev. 2021, 39, 100338. [Google Scholar] [CrossRef]
- Oved, D.; Zhu, T. BodyPix: Real-Time Person Segmentation in the Browser with TensorFlow.js. 2019. Available online: https://blog.tensorflow.org/2019/11/updated-bodypix-2.html (accessed on 20 January 2022).
- Yao, R.; Lin, G.; Xia, S.; Zhao, J.; Zhou, Y. Video Object Segmentation and Tracking: A survey. ACM Trans. Intell. Syst. Technol. 2020, 11, 36. [Google Scholar] [CrossRef]
- Camalan, S.; Sengul, G.; Misra, S.; Maskeliunas, R.; Damaševičius, R. Gender detection using 3d anthropometric measurements by kinect. Metrol. Meas. Syst. 2018, 25, 253–267. [Google Scholar]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [Green Version]
- Qiao, M.; Cheng, J.; Bian, W.; Tao, D. Biview learning for human posture segmentation from 3D points cloud. PLoS ONE 2014, 9, e85811. [Google Scholar] [CrossRef] [Green Version]
- Shum, H.P.H.; Ho, E.S.L.; Jiang, Y.; Takagi, S. Real-time posture reconstruction for Microsoft Kinect. IEEE Trans. Cybern. 2013, 43, 1357–1369. [Google Scholar] [CrossRef]
- Ryselis, K.; Petkus, T.; Blažauskas, T.; Maskeliūnas, R.; Damaševičius, R. Multiple Kinect based system to monitor and analyze key performance indicators of physical training. Hum.-Centric Comput. Inf. Sci. 2020, 10, 51. [Google Scholar] [CrossRef]
- Ho, E.S.L.; Chan, J.C.P.; Chan, D.C.K.; Shum, H.P.H.; Cheung, Y.; Yuen, P.C. Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments. Comput. Vis. Image Underst. 2016, 148, 97–110. [Google Scholar] [CrossRef] [Green Version]
- Huang, Y.; Shum, H.P.H.; Ho, E.S.L.; Aslam, N. High-speed multi-person pose estimation with deep feature transfer. Comput. Vis. Image Underst. 2020, 197–198, 103010. [Google Scholar] [CrossRef]
- Lehment, N.; Kaiser, M.; Rigoll, G. Using Segmented 3D Point Clouds for Accurate Likelihood Approximation in Human Pose Tracking. Int. J. Comput. Vis. 2013, 101, 482–497. [Google Scholar] [CrossRef] [Green Version]
- Kulikajevas, A.; Maskeliunas, R.; Damaševičius, R. Detection of sitting posture using hierarchical image composition and deep learning. PeerJ Comput. Sci. 2021, 7, e447. [Google Scholar] [CrossRef]
- Qin, H.; Zhang, S.; Liu, Q.; Chen, L.; Chen, B. PointSkelCNN: Deep Learning-Based 3D Human Skeleton Extraction from Point Clouds. Comput. Graph. Forum 2020, 39, 363–374. [Google Scholar] [CrossRef]
- Kulikajevas, A.; Maskeliunas, R.; Damasevicius, R. Adversarial 3D Human Pointcloud Completion from Limited Angle Depth Data. IEEE Sens. J. 2021, 21, 27757–27765. [Google Scholar] [CrossRef]
- Kulikajevas, A.; Maskeliūnas, R.; Damaševičius, R.; Wlodarczyk-Sielicka, M. Auto-refining reconstruction algorithm for recreation of limited angle humanoid depth data. Sensors 2021, 21, 3702. [Google Scholar] [CrossRef]
- Kulikajevas, A.; Maskeliunas, R.; Damasevicius, R.; Scherer, R. Humannet-a two-tiered deep neural network architecture for self-occluding humanoid pose reconstruction. Sensors 2021, 21, 3945. [Google Scholar] [CrossRef]
- Hu, P.; Ho, E.S.; Munteanu, A. 3DBodyNet: Fast Reconstruction of 3D Animatable Human Body Shape from a Single Commodity Depth Camera. IEEE Trans. Multimed. 2021, 24, 2139–2149. [Google Scholar] [CrossRef]
- Google Developers. Protocol Buffer Basics: Java. 2019. Available online: https://developers.google.com/protocol-buffers/docs/javatutorial (accessed on 10 January 2022).
- Tomassi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), Bombay, India, 7 January 1998; pp. 839–846. [Google Scholar]
- Bentley, J.L. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Serkan, T. Euclidean Cluster Extraction-Point Cloud Library 0.0 Documentation. 2020. Available online: https://pcl.readthedocs.io/en/latest/cluster_extraction.html (accessed on 15 January 2022).
- Lee, D.T.; Wong, C.K. Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees. Acta Inform. 1977, 9, 23–29. [Google Scholar] [CrossRef]
- Palmero, C.; Clapés, A.; Bahnsen, C.; Møgelmose, A.; Moeslund, T.B.; Escalera, S. Multi-modal rgb–depth–thermal human body segmentation. Int. J. Comput. Vis. 2016, 118, 217–239. [Google Scholar] [CrossRef] [Green Version]
- Huang, L.; Tang, S.; Zhang, Y.; Lian, S.; Lin, S. Robust human body segmentation based on part appearance and spatial constraint. Neurocomputing 2013, 118, 191–202. [Google Scholar] [CrossRef]
- Li, S.; Lu, H.; Zhang, L. Arbitrary body segmentation in static images. Pattern Recognit. 2012, 45, 3402–3413. [Google Scholar] [CrossRef]
- Couprie, C.; Farabet, C.; Najman, L.; LeCun, Y. Indoor semantic segmentation using depth information. arXiv 2013, arXiv:1301.3572. [Google Scholar]
- Wang, W.; Neumann, U. Depth-aware cnn for rgb-d segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 135–150. [Google Scholar]
Kinect, ms | RealSense, ms | Worst Case, ms | |
---|---|---|---|
PCL Euclidean clustering | 980 | 344,944 | 236 |
Bounding box | 184 | 247 | 228 |
Expanding bounding box | 16.3 | 45.7 | 145 |
Kinect, M | RealSense, M | Worst Case, M | |
---|---|---|---|
PCL Euclidean clustering | 193.7 | 38,099.9 | 39.8 |
Bounding box | 39.6 | 37.1 | 21.1 |
Expanding bounding box | 2.5 | 4.2 | 21.1 |
Solution | Accuracy | Segments | Based on | Data Type |
---|---|---|---|---|
RGB–Depth–Thermal [28] | 79% | Human | Random forest | RGBD + IR |
Body part models + GC [29] | 65% | Human | Geometrical + prior knowledge | RGB |
Pictorial structures + GC [30] | 58% | Human | Geometrical + prior knowledge | Depth |
Semantic CNN [31] | 65% | Any object | CNN | RGBD |
Depth aware CNN [32] | 49–61% | Any object | CNN | RGBD |
Suggested | 24–76% | Any object | Geometrical | Depth |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ryselis, K.; Blažauskas, T.; Damaševičius, R.; Maskeliūnas, R. Computer-Aided Depth Video Stream Masking Framework for Human Body Segmentation in Depth Sensor Images. Sensors 2022, 22, 3531. https://doi.org/10.3390/s22093531
Ryselis K, Blažauskas T, Damaševičius R, Maskeliūnas R. Computer-Aided Depth Video Stream Masking Framework for Human Body Segmentation in Depth Sensor Images. Sensors. 2022; 22(9):3531. https://doi.org/10.3390/s22093531
Chicago/Turabian StyleRyselis, Karolis, Tomas Blažauskas, Robertas Damaševičius, and Rytis Maskeliūnas. 2022. "Computer-Aided Depth Video Stream Masking Framework for Human Body Segmentation in Depth Sensor Images" Sensors 22, no. 9: 3531. https://doi.org/10.3390/s22093531