Evaluation of Various State of the Art Head Pose Estimation Algorithms for Clinical Scenarios
Abstract
:1. Introduction
2. Materials and Methods
2.1. Pose Estimation Algorithms
2.1.1. OpenFace 2.0
2.1.2. 3DDFA_V2
2.1.3. MediaPipe
2.2. Experimental Setup
2.3. Data Reduction and Analysis
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
Abbreviations
3DDFA_V2 | 3D Dense Face Alignment Version 2 |
MTCNN | Multi-Task cascaded Convolutional Neural Network |
References
- Murphy-Chutorian, E.; Trivedi, M.M. Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 607–626. [Google Scholar] [CrossRef]
- Morency, L.P.; Whitehill, J.; Movellan, J. Generalized adaptive view-based appearance model: Integrated framework for monocular head pose estimation. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, The Netherlands, 17–19 September 2008; pp. 1–8. [Google Scholar]
- Asthana, A.; Zafeiriou, S.; Cheng, S.; Pantic, M. Incremental face alignment in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1859–1866. [Google Scholar]
- Albiero, V.; Chen, X.; Yin, X.; Pang, G.; Hassner, T. img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 7617–7627. [Google Scholar]
- Wang, H.; Hu, J.; Deng, W. Face feature extraction: A complete review. IEEE Access 2018, 6, 6001–6039. [Google Scholar] [CrossRef]
- Wu, Y.; Ji, Q. Facial landmark detection: A literature survey. Int. J. Comput. Vis. 2019, 127, 115–142. [Google Scholar] [CrossRef]
- Sanchez-Moreno, A.S.; Olivares-Mercado, J.; Hernandez-Suarez, A.; Toscano-Medina, K.; Sanchez-Perez, G.; Benitez-Garcia, G. Efficient face recognition system for operating in unconstrained environments. J. Imaging 2021, 7, 161. [Google Scholar] [CrossRef] [PubMed]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; Volume 1. [Google Scholar]
- Farfade, S.S.; Saberian, M.J.; Li, L.J. Multi-view face detection using deep convolutional neural networks. In Proceedings of the International Conference on Multimedia Retrieval, Shanghai, China, 23–26 June 2015; pp. 643–650. [Google Scholar]
- Zhanga, S.; Wang, X.; Li, S. Faceboxes: A CPU real-time and accurate unconstrained face detector. Neurocomputing 2019, 364, 297–309. [Google Scholar] [CrossRef]
- Zhang, k.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef]
- Zhu, X.; Ramanan, D. Face detection, pose estimation, and landmark localization in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2879–2886. [Google Scholar]
- Yan, J.; Zhang, X.; Lei, Z.; Li, S.Z. Face detection by structural models. Image Vis. Comput. 2014, 32, 790–799. [Google Scholar] [CrossRef]
- Jain, V.; Learned-Miller, E. Fddb: A Benchmark for Face Detection in Unconstrained Settings; Technical Report UM-CS-2010-009; Dept. of Computer Science, UMass Amherst: Amherst, MA, USA, 2010. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Wider face: A face detection benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5525–5533. [Google Scholar]
- King, D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Baltrusaitis, T.; Zadeh, A.; Lim, Y.C.; Morency, L. OpenFace 2.0: Facial behavior analysis toolkit. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Xi’an, China, 15–19 May 2018; pp. 59–66. [Google Scholar]
- Kim, H.; Kim, H.; Hwang, E. Real-time facial feature extraction scheme using cascaded networks. In Proceedings of the IEEE International Conference on Big Data and Smart Computing, Kyoto, Japan, 27 February–2 March 2019; pp. 1–7. [Google Scholar]
- Kim, H.W.; Kim, H.J.; Rho, H.; Hwang, E. Augmented EMTCNN: A fast and accurate facial landmark detection network. Appl. Sci. 2020, 10, 2253. [Google Scholar] [CrossRef]
- Liu, R.; Lehman, J.; Molino, P.; Petroski Such, F.; Frank, E.; Sergeev, A.; Yosinski, J. An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Kartynnik, Y.; Ablavatski, A.; Grishchenko, I.; Grundmann, M. Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs. In Proceedings of the Workshop on Computer Vision for Augmented and Virtual Reality, Long Beach, CA, USA, 17 June 2019; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.; Sheikh, Y. OpenPose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 23, 172–186. [Google Scholar] [CrossRef] [PubMed]
- Baltanas, S.F.; Ruiz-Sarmiento, J.R.; Gonzalez-Jimenez, J. A face recognition system for assistive robots. In Proceedings of the 3rd International Conference on Applications of Intelligent Systems, Las Palmas de Gran Canaria, Spain, 7–9 January 2020; pp. 1–6. [Google Scholar]
- Baltrušaitis, T.; Robinson, P.; Morency, L.P. Openface: An open source facial behavior analysis toolkit. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–10. [Google Scholar]
- Zhu, X.; Lei, Z.; Liu, X.; Shi, H.; Li, S.Z. Face alignment across large poses: A 3d solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 146–155. [Google Scholar]
- Koestinger, M.; Wohlhart, P.; Roth, P.M.; Bischof, H. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV workshops), Barcelona, Spain, 6–13 November 2011; Volume 9. [Google Scholar]
- Shen, J.; Zafeiriou, S.; Chrysos, G.G.; Kossaifi, J.; Tzimiropoulos, G.; Pantic, M. The first facial landmark tracking in-the-wild challenge: Benchmark and results. IEEE Int. Conf. Comput. Vis. Work. 2015, 7–13, 50–58. [Google Scholar]
- Guo, J.; Zhu, X.; Yang, Y.; Yang, F.; Lei, Z.; Li, S.Z. Towards fast, accurate and stable 3D dense face alignment. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 469–481. [Google Scholar]
- Stamm, O.; Heimann-Steinert, A. Accuracy of monocular two-dimensional pose estimation compared with a reference standard for kinematic multiview analysis: Validation study. JMIR Mhealth Uhealth 2021, 8, e19608. [Google Scholar] [CrossRef] [PubMed]
- Zadeh, A.; Chong Lim, Y.; Baltrusaitis, T.; Morency, L.P. Convolutional experts constrained local model for 3D facial landmark detection. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2519–2528. [Google Scholar]
- Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.; Guang Yong, M.; Lee, J.; et al. Mediapipe: A framework for building perception pipelines. arXiv 2019, arXiv:1906.08172. [Google Scholar]
- Intel® RealSense™ Depth Camera D415. Available online: https://ark.intel.com/content/www/fr/fr/ark/products/128256/intel-realsense-depth-camera-d415.html (accessed on 25 April 2022).
- Langland, O.E.; Langlais, R.P.; McDavid, W.D.; DelBalso, A.M. Panoramic Radiology, 2nd ed.; Lea & Febiger: Philadelphia, PN, USA, 1989; 440p. [Google Scholar]
- Zhang, Z.; Cisneros, E.; Lee, H.Y.; Vu, J.P.; Chen, Q.; Benadof, C.N.; Whitehill, J.; Rouzbehani, R.; Sy, D.T.; Huang, J.S.; et al. Hold that pose: Capturing cervical dystonia’s head deviation severity from video. Ann. Clin. Transl. Neurol. 2022, 9, 684–694. [Google Scholar] [CrossRef] [PubMed]
Algorithm | Landmarks | Facial Analysis Tasks | Performances | Availability of the Source Code | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
2D | 3D | Number | Pose | Expression | Gaze | FPS | MND | NME | ||||
Dlib [16] | ✓ | ✗ | 68 | ✗ | ✗ | ✗ | 15 | - | - | ✓ | ||
OpenPose [22] | ✓ | ✓ | 70 | ✓ | ✗ | ✗ | 22 | - | - | ✓ | ||
OpenFace [24] | ✓ | ✓ | 68 | ✓ | ✓ | ✓ | 30 | - | - | ✓ | ||
OpenFace 2.0 [17] | ✓ | ✓ | 68 | ✓ | ✓ | ✓ | 30 | - | - | ✓ | ||
MTCNN [11] | ✓ | ✓ | 5 | ✓ | ✗ | ✗ | 99 | - | - | ✓ | ||
EMTCNN [18] | ✓ | ✓ | 68 | ✓ | ✗ | ✗ | 70 | 6.63 | - | ✗ | ||
Augmented EMTCNN [19] | ✓ | ✓ | 68 | ✓ | ✗ | ✗ | 68 | 5.59 | - | ✗ | ||
3DDFA [25] | ✓ | ✓ | 68 | ✓ | ✗ | ✗ | 20 | - | 5.42 | ✓ | ||
3DDFA_V2 [28] | ✓ | ✓ | 68 | ✓ | ✗ | ✗ | 50 | - | 3.51 | ✓ | ||
MediaPipe [21] | ✓ | ✓ | 468 | ✓ | ✓ | ✓ | - | - | - | ✓ |
Algorithms | OpenFace 2.0 | 3DDFA_V2 | MediaPipe | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Yaw | Pitch | Roll | Yaw | Pitch | Roll | Yaw | Pitch | Roll | |||
Error () | 12.37 | 14.12 | −0.75 | −5.62 | 0.87 | −0.37 | 11.00 | 7.00 | 1.37 | ||
SD () * | 12.30 | 13.62 | 2.65 | 3.33 | 3.83 | 3.11 | 10.65 | 10.22 | 2.44 |
Tolerable Error (5) | Tolerable Error (10) | ||||
---|---|---|---|---|---|
Algorithms | Angle | Stalling Angle () | SDV | Stalling Angle () | SDV |
OpenFace 2.0 | +Yaw | 32.32 | 12.14 | 40.13 | 3.33 |
−Yaw | −30.48 | 3.14 | −50.36 | 16.52 | |
+Pitch | 33.99 | 4.70 | 40.33 | 3.14 | |
−Pitch | −41.70 | 4.28 | −57.84 | 2.27 | |
+Roll | - * | - * | - * | - * | |
−Roll | - * | - * | - * | - * | |
3DDFA_V2 | +Yaw | - * | - * | - * | - * |
−Yaw | −42.6 | 6.43 | −54.24 | 7.45 | |
+Pitch | 42.7 | 0.00 | 57.84 | 2.27 | |
−Pitch | −41.09 | 16.66 | - * | - * | |
+Roll | - * | - * | - * | - * | |
−Roll | - * | - * | - * | - * | |
MediaPipe | +Yaw | 29.18 | 4.82 | 29.18 | 4.82 |
−Yaw | −49.48 | 3.18 | −54.44 | 10.20 | |
+Pitch | 34.00 | 4.70 | 34.00 | 4.70 | |
−Pitch | −37.80 | 14.64 | - * | - * | |
+Roll | - * | - * | - * | - * | |
−Roll | - * | - * | - * | - * |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hammadi, Y.; Grondin, F.; Ferland, F.; Lebel, K. Evaluation of Various State of the Art Head Pose Estimation Algorithms for Clinical Scenarios. Sensors 2022, 22, 6850. https://doi.org/10.3390/s22186850
Hammadi Y, Grondin F, Ferland F, Lebel K. Evaluation of Various State of the Art Head Pose Estimation Algorithms for Clinical Scenarios. Sensors. 2022; 22(18):6850. https://doi.org/10.3390/s22186850
Chicago/Turabian StyleHammadi, Yassine, François Grondin, François Ferland, and Karina Lebel. 2022. "Evaluation of Various State of the Art Head Pose Estimation Algorithms for Clinical Scenarios" Sensors 22, no. 18: 6850. https://doi.org/10.3390/s22186850