1 Introduction

Since it has been shown that hazardous situations cannot be completely removed from industrial environments, employers and employees share a responsibility to jointly invest efforts in safety by increasing self-awareness and reducing risks of workplace injuries (Howard 2010). For these purposes, the Occupational Safety and Health Administration (OSHA)Footnote 1 has proposed an official strategy for the preventive elimination of hazards known as the “Hierarchy of controls”. In brief, the five levels of OSHA controls are: (1) elimination (physically removing hazards), (2) substitution (replacing hazards), (3) engineering controls (isolating people from the hazard), (4) administrative controls (changing how people work), and (5) use of personal protective equipment (PPE). As illustrated in Fig. 1a, the use of PPE is recommended when preceding OSHA controls are ineffective or inapplicable in particular workplacesFootnote 2. In these instances, PPE often functions as a crucial front-line barrier between employees and hazards, making PPE compliance an important topic for both academia and industry.

Fig. 1
figure 1

PPE compliance and workplace safety standards. a The pyramid illustrates the hierarchy of controls proposed by the OSHA; bd the process of PPE management and its relation to the relevant industrial standards for the three different phases of b industry sector-specific needs, c planning, and d implementation process

According to the corresponding body part and physiological function that they intend to protect, PPE can be grouped into: (1) eye and face protection (safety glasses, face shields, welding masks, etc.); (2) head protection (caps and hard hats); (3) hearing protection (earplugs and earmuffs); (4) hand and arm protection (gloves); (5) foot and legs protection (steel-toed boots); (6) respiratory protection (masks), to name a few (Occupational Safety and Health Administration 2004; Regulation (EU) 2016/425 2016). The high variability among industries and hazards that may occur, along with the corresponding PPE available for their mitigation heightens the complexity of PPE management in practice (Fig. 1b). The key role in ensuring workplace safety is accomplished by on-site safety managers (supervisors), who are responsible for implementing international safety policies, safety training, as well as monitoring their follow-up usage and PPE maintenance in companies (Fig. 1c).

According to the WAC 296-800‐160 (Fig. 1d), implementing PPE controls involves the following iterative steps (Fig. 2a): (1) assessing workplaces for hazards, (2) selecting corresponding PPE, (3) monitoring and ensuring that the appropriate PPE is used, (4) maintaining and replacing PPE to keep them in safe and good condition, and (5) retraining employees if necessary (Washington State Legislature 2020). However, a survey conducted by Kimberly-Clark revealed that, according to 69% of respondents, the root of PPE noncompliance was a feeling that PPE was unnecessary or excessive in a specific situationFootnote 3 (Kimberly-Clark 2012). It is reported that the three key factors that influence the success of PPE compliance are: (1) Personal factors (like/style, frustration, fatigue, anticipation, habits, motivation, skills); (2) Workplace factors (rules, procedures, policies, quality of provided PPE, frequency of replacement, supervisor support, company culture); and (3) PPE factors (style, brand, color, size, options, fit, weight, thermal properties, ergonomics, ease of care, and replacement frequency) (NIOSH 2017; Wong et al. 2020; Rafindadi et al. 2022).

To illustrate the scale and impact of this challenge, we remind that liabilities and non-compliance with PPE recommendations cost the US alone ~ 360 B dollars annually (Bureau of Labor Statistics (BLS) 2015). Furthermore, reports from 2017 indicated that over 2.8 million nonfatal injuries occurred, of which a large portion could be prevented through the proper use of PPE (Bureau of Labor Statistics (BLS) 2017). It is important to note that, nowadays, affordability is no longer a barrier to using PPE. Accordingly, the two key on-site measures for preventing PPE misuse are: (1) employees’ education and retraining, and (2) PPE compliance. Since these two are separate workplace safety topics, the present review is focused on the technological advancements towards enabling objective and timely supervision of PPE compliance - which remains a major challenge from the perspective of workplace safety managers (Fig. 2d). As current manual monitoring of workers is time-consuming, ineffective, and expensive, the demand for Information and Communication Technology (ICT) tools to automate PPE compliance has significantly increased. It is widely accepted in the literature that Industry 4.0 technologies like cloud computing, artificial intelligence (AI), cybersecurity, and Internet of Things (IoT) are essential for digitalizing PPE management. This review focuses on CV-based solutions that have gained the most attention in the literature due to their adaptability for solving PPE compliance across different industries.

Fig. 2
figure 2

The process and types of PPE compliance: a key steps in the iterative PPE selection process; bd internal (at the factory); and e internal and/or external PPE compliance

The problem of visual PPE compliance (Fig. 2b-d) has been the subject of various studies for over a decade. However, our insights into the topic suggest that, beyond the application of computer vision algorithms, a series of related challenges remain underestimated in the academic literature. Therefore, the aim of this review article is to provide a systematic and comparative review of: (1) technological progress, (2) industry-specific requirements and workflows, and (3) barriers that future studies should address to enable the broader application of CV-based PPE compliance. This is achieved by answering the following Key Questions:

  • Question 1: What approaches have been used to solve the visual PPE compliance problem?

  • Question 2: What acquisition devices, algorithms, datasets, frameworks, and industry environments have been considered in the literature?

  • Question 3: What are the technological challenges preventing wider adoption of the available solutions?

  • Question 4: What are the ethical and cybersecurity concerns of digitalized PPE compliance?

The rest of this paper is organized as follows. In Sect. 2, we described the review methodology used in this study and emphasized key trends in the considered literature. Section 3 is split into four subsections to present an extensive overview of state-of-the-art computer vision-based approaches for PPE compliance. In Sect. 4, we present the generic concept and workflow of CV-based PPE compliance 4.0, while highlighting domain-specific complexities and user requirements that are frequently neglected in related studies reviewed (Sect. 3). Considering this, in Sect. 5, we highlighted major challenges that further studies need to address in order to achieve wider industrial applications. The final section summarized the main findings and offers guidelines for future work on the topic.

2 Methodology

In order to conduct the systematic literature review, the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA, http://www.prisma-statement.org) methodology was used (Page et al. 2021). Three relevant multidisciplinary scientific databases: ScienceDirect, IEEE Xplore, and Web of Science were queried using the “Personal Protective Equipment” keywords (Fig. 3) on July 23, 2024. All three sources were searched separately. The search results were refined by selecting publications from 2010 to the present moment, resulting in 42,690 articles from ScienceDirect, 1825 articles from IEEE Xplore, and 11,996 articles from Web of Science Core Collection. After the authors removed duplicates, articles were filtered using the following keywords as the inclusion criteria: “Computer Vision”, “Deep Learning”, and “Artificial Intelligence”. After removing duplicates, filters such as “full-text only” and “English language” were applied. Only papers published in journals or conference proceedings were included. Next, we excluded unavailable papers and those whose titles and abstracts did not align with the research objective. Moreover, papers that answer Key Questions 1 and 2, providing the most relevant analysis and complete information for the review based on the outcome reported were selected for further analysis. Each record was screened by at least two reviewers (with backgrounds in Data Science and Industrial Engineering), and all disagreements about eligibility were resolved through discussion. It is important to note that, in the conducted literature search, according to the given criteria, keywords could appear anywhere in a paper, including the reference section. This resulted in a significant number of papers being excluded from the research. Rejected records were removed from further consideration. Full texts of the potentially eligible articles were retrieved and examined (N = 59), and after reading the full-texts, 32 publications were selected for the analysis in this survey. In addition to the academic databases, we considered gray literature consisting of industry and organizational reports and website articles (as they have a significant amount of data that can help to understand different aspects of the PPE compliance problem). The PRISMA diagram (Fig. 3) presents the number of publications for each explored database, as well as the total number of papers remaining after applying all criteria.

Fig. 3
figure 3

PRISMA diagram of data collection for the literature review

2.1 Trends in the collected literature material

Following the PRISMA methodology, we revealed that the majority of literature on PPE compliance comes from ScienceDirect and Web of Science databases, with a noticeable surge in research publications over the last four years (Fig. 4a). By utilizing the Web of Science’s sub-querying capability, we further investigated the domains that extensively studied PPE and found that Covid-19 (pandemics/epidemiology), Safety, Healthcare, Industry, and Energy were the dominant areas of focus (Fig. 4b). Figure 4c illustrates the volume of research studies employing computer vision, artificial intelligence, and deep learning techniques to address the PPE compliance problem (ScienceDirect). Notably, there has been an increasing trend (since 2020) in utilizing deep learning approaches compared to traditional methods. Despite the growing emphasis on digitalizing PPE compliance (Fig. 4d) and the developments driven by the Covid-19 pandemic, the analysis indicates that the commercialization and practical use of available solutions are still not widely present (ScienceDirect). Obviously, there is a gap between research efforts and real-world applications, suggesting that a deeper understanding and clarification of challenges for a wider audience are needed before the effective adoption of AI-driven PPE compliance across industries.

Fig. 4
figure 4

Total number and distribution of publications that studied problems related to PPE. a During the last ten years; b across different domains; c by applying computer vision, artificial intelligence, and deep learning based studies; d focused on Covid-19

3 Review of literature focused on computer vision-based PPE compliance

In this section, we assume that all steps (Fig. 2a) preceding the visual PPE compliance have been performed successfully (NIOSH 2010, 2016); therefore, the key task is to use computer vision techniques to ensure that employees are wearing the required PPE. Accordingly, we review selected computer vision studies from various perspectives, from the high-level approaches to specific algorithms, datasets, and technologies used. A comparative summary is given in Table 1.

3.1 Review of high-level approaches used for solving the visual PPE compliance problem

Pioneering studies tackled the problem of PPE compliance as a detection problem combining feature engineering with traditional machine learning algorithms (Rubaiyat et al. 2016), while the majority of recent studies have used various deep learning (DL) models and architectures. Most studies have considered PPE compliance as a multiple-step problem, proposing solutions that consist of several modules/steps with object detectors and/or classifiers as the final step (Nath et al. 2020; Nagrath et al. 2021; Vukicevic et al. 2022). In addition, several studies have employed various segmentation-based techniques (Rubaiyat et al. 2016; Li et al. 2018; Lee et al. 2023b), as well as face detection algorithms (Tran et al. 2019). The incorporation of human detection modules to enhance PPE compliance has shown promising results (Chen and Demachi 2020), as implementations have been made using both traditional (Dalal and Triggs 2005; Zhu et al. 2006) and DL techniques (Nikouei et al. 2018).

A major limitation of approaches that used detection and segmentation algorithms for employee recognition is the inability to distinguish body parts, as the inspection of proper PPE use is inherently a more complex task compared to the sole PPE recognition in images. Questioning whether PPE is properly used represents a more difficult task compared to simple equipment detection, e.g., “Is the hard hat present?” or “Is the face mask covering the nose?” This problem is important because improper equipment usage can also lead to workplace injuries. To the best of our knowledge, only a few studies have addressed the issue of improper use of PPE (Chen and Demachi 2020; Vukicevic et al. 2022). Other studies were primarily focused on detecting if PPE is present in the region of interest, i.e., a hard hat is properly used if it is present in the region near the head (Mneymneh et al. 2019) or sufficiently near the head (Guo et al. 2020; Chen and Demachi 2020; Pei et al. 2024).

In recent studies, promising solutions have been proposed by leveraging pose estimators to define regions of interest (ROIs), which are facilitated as inputs for subsequent PPE detection and classification algorithms (Chen and Demachi 2020; Vukicevic et al. 2022) (Fig. 5). These types of approaches have technological advantages due to their flexibility and extensibility. Specifically, if the PPE compliance is considered as the binary classification of a corresponding body ROI – this means that in situations when one wants to remove or add a new PPE class, there will be no need for any additional training or changes in the performance of the rest of the PPE compliance classifiers (Vukicevic et al. 2022). This is an important distinction from approaches that proposed using multi-class detectors and classifiers (e.g. for simultaneously inspecting the use of hard hats and masks, which means that if one considers fine-tuning such model for another type of mask - the transfer learning may negatively affect the hard hat compliance). Although the ROI classification provides a generic and user-friendly approach, it is reported that head-mounted PPE compliance is a specific case - as there is high variability in PPEs in terms of appearance and design (Isailovic et al. 2021). Additionally, it is a frequent requirement that an employee needs to wear multiple PPEs simultaneously (e.g., hard hats, safety masks, safety glasses, and earmuffs) – which means that there is a need to run a series of binary classifiers. As an alternative, depending on the use case, object detectors may be a more suited approach for the compliance of the head-mounted PPE.

Fig. 5
figure 5

Workflow of the generic four-step procedure for PPE compliance proposed in (Vukicevic et al. 2022)

3.2 Review of PPE types considered in previous studies

Compliance with hard hats/helmets represents the most frequently addressed in the literature (Rubaiyat et al. 2016; Wu et al. 2019). In recent studies, the number of attempts to simultaneously inspect a wider range of PPE, such as shirts, belts, gloves, pants, and shoes, has increased (Pradana et al. 2019; Tran et al. 2019; Vukicevic et al. 2022; Pei et al. 2024). Furthermore, several studies have focused only on head and/or chest PPE compliance (Delhi et al. 2020; Nath et al. 2020; Isailovic et al. 2021; Alassaf and Said 2024). The Covid-19 pandemic has recently brought attention to the issue of face masks (Oumina et al. 2020; Chavda et al. 2021; Loey et al. 2021; Nagrath et al. 2021; Goyal et al. 2022; Habib et al. 2022; Kumar et al. 2022a; Ullah et al. 2022; Pei et al. 2024). To summarize, the majority of previous studies were focused only on a few PPE types as listed in the second column of Table 1. To emphasize this gap, Fig. 5f illustrates the variability of PPEs used in industry practice for protecting various body parts and functions. For example, for the head alone - multiple PPEs could be used simultaneously (e.g., hardhat for head protection, headphones for hearing protection, glasses and visors for eye protection, and masks for respiratory protection). Considering only respiratory protection, it is obvious that Covid-19 masks are the only specific case that gained attention during the global pandemic, while the majority of regular industry needs remain not covered so far.

In future studies, a significant effort needs to be invested in verifying and testing the usability of CV-based PPE compliance across different industrial scenarios. A major challenge in this direction is the availability of suitable data sets, which are reviewed later in Sect. 3.4. Here, we emphasize that the use of game engines (e.g., Unity, Unreal, Blender, etc.) represents a promising approach that could be used to enable both assessment of existing and the development of new procedures. Specifically, having a digital representation of various PPEs (and human body shapes and industry environments), it is possible to use rendering engines to quickly generate highly realistic synthetic data that are needed for the training/assessment of computer vision algorithms (Man and Chahl 2022).

3.3 Review of input data, acquisition devices, and environments considered in the literature

Closed-Circuit Television (CCTV) cameras (Zhafran et al. 2019; Delhi et al. 2020; Roy et al. 2020) and IP cameras (Tran et al. 2019; Isailovic et al. 2021; Vukicevic et al. 2022) have been primarily used to acquire RGB (red, green, and blue) images, regardless of whether they were real-time or single-shot approaches. Furthermore, the acquisition cameras were primarily static and covered certain regions of interest. Some studies evaluated this procedure by varying the PPE-to-camera distance, showing inconsistent results (Zhafran et al. 2019; Chen and Demachi 2020). Chen et al. showed that an individual’s posture had an impact on their results (Chen and Demachi 2020). Technologies such as 3D cameras have been employed for pose estimation and unsafe act detection. In particular, Microsoft Kinect is used for the real-time identification of construction workers’ unsafe behaviors (Guo et al. 2018), and the JVC 3D Everio Camcorder is used for examining unsafe actions, such as a fall from a ladder (Han and Lee 2013). In areas with limited light, the fusion of standard and infrared cameras may be a suitable solution (Crescitelli et al. 2020). Finally, some studies involved the use of AI-enabled cameras, such as the Azure AI Vision camera, to enhance prediction performance (Balakreshnan et al. 2020).

In general, the mentioned nonstandard equipment used for experiments published in academic studies is expensive and not widely available compared to conventional IP cameras. When talking about industry environments, it may be assumed that IP cameras are already available in most workplaces (the technological bottleneck of processing IP camera streams is discussed later in Sect. 5.2). Considering the sizes of industry halls, it is not trivial to develop a reliable and cost-effective solution for observing and inspecting the use of PPE for an entire company. Instead, the use of AI to automate PPE compliance at certain checkpoints (e.g., entry points) appears to be more achievable and open possibilities for using dedicated devices for both image acquisition and verification of employees’ identities using Radio Frequency Identification (RFID) cards instead of face detection techniques.

3.4 Review of algorithms, datasets, and frameworks used in the literature

There are two major approaches proposed for solving the PPE compliance problem: (1) object detection; and (2) image classification (Table 1). Object detection approaches process the whole image in order to determine bounding boxes around the PPEs of interest, as well as to assign them a class label. In the PPE context, object detection models are employed more frequently in the literature, e.g. (Wu et al. 2019; Jin et al. 2021; Loey et al. 2021; Alassaf and Said 2024), although their performances are comparable with those of state-of-the-art classification-based approaches (Table 1). The most commonly used object detectors are Single-Shot Detectors (SSDs) (Wu et al. 2019; Nagrath et al. 2021), Faster Region Convolutional Neural Networks (R-CNNs) (Fang et al. 2018; Zhafran et al. 2019; Lee et al. 2023a), and You Only Look Once (YOLOs) (Tran et al. 2019; Chen and Demachi 2020; Delhi et al. 2020Nath et al. 2020; Roy et al. 2020; Isailovic et al. 2021; Jin et al. 2021; Loey et al. 2021; Protik et al. 2021; Cheng et al. 2022; Ferdous and Ahsan 2022; Kumar et al. 2022a, b; Li et al. 2022; Farooq et al. 2023; Shahin et al. 2023; Zeng et al. 2023; Pei et al. 2024). Compared to Faster R-CNN, which ensures high accuracy in situations where PPEs are far away and appear small on images (under the trade-off running at a lower FPS), single-stage approaches (YOLO and SSD) provide competitive accuracy with less computational demands (offering capabilities for real-time applications and deployment on edge-devices).

Classification approaches are typically multiple-step pipelines, which include human detection and pose estimation steps, followed by the cropping that isolates PPE regions of interest (ROI) subjected for the classification – which makes them more computationally demanding compared to the direct use of object detectors (Chen and Demachi 2020; Isailovic et al. 2021; Vukicevic et al. 2022). ROI classification approaches focus on smaller, more relevant regions of an image, which improves reliability in detecting various PPE, and allows for the assessment of whether the equipment is being used correctly (e.g., they ensure that a helmet is placed on the head, rather than simply detecting it anywhere in the image). Compared to pure object detection methods, ROI-based classification approaches are more holistic, which makes their extension or adaptation for different industrial setups straightforward. ROI classification has been performed by using both DL (Wu et al. 2019) and radiomics algorithms (Rubaiyat et al. 2016; Li et al. 2018), while recent studies are dominantly DL-based. The most frequently used classification DL architectures are ResNet (He et al. 2016) and MobileNetV2 (Sandler et al. 2018). Regarding studies that combined detection/classification with pose detection (Mneymneh et al. 2019; Chen and Demachi 2020), the most frequently used pose estimators are OpenPose (Cao et al. 2017) and HigherHRNet (Cheng et al. 2020). Zhafran et al. (2019) concluded that motion blur significantly affects detection results in the case of gloves. Furthermore, these models may produce inconsistent results, even for two nearly identical images, which is a crucial limitation for practical applications. Although this problem can be reduced by accounting for several images and implementing a voting scheme (Tran et al. 2019), such approaches are not suitable for real-time applications (Chen and Demachi 2020; Nath et al. 2020).

Regarding the contribution of CV-based PPE compliance studies to the AI progress, many studies demonstrated inventive combinations or refinements of existing deep learning architectures and approaches. Habib et al. (2022) investigated various deep learning architectures such as VGG16, VGG19, InceptionV3, ResNet-101, ResNet-50, EfficientNet, MobileNetV1, and MobileNetV2, and proposed a real-time face mask detection model suitable for edge device deployment. The edge-AI model is based on MobileNetV2 architecture that extracts salient features from the input data and passes them to an autoencoder to form more abstract representations before the classification layer. Extensive data augmentation techniques, such as rotation, flipping, Gaussian blur, sharpening, embossing, skewing, and shearing, are employed to increase the number of samples for effective training on three datasets: Face Mask Detection (FMD), Face Mask (Oumina et al. 2020), and Real-World Mask Face Recognition (RMFR) (more than 100.000 images). Chen and Demachi (2020) introduced a vision-based automated monitoring approach designed to enhance occupational safety by ensuring proper PPE usage, combining deep learning-based individual detection and object detection through geometric relationship analysis. Detected PPE is associated with individuals by finding the minimum Euclidean distance between bounding boxes and detected neck key points - this distance is measured to determine if each individual was using their PPE properly. Lee et al. (2023b) proposed a monitoring framework using a pixel-based object-driven approach (training a machine learning model on a dataset where objects are annotated pixel-wise) to minimize errors. Unlike the bounding box-based approach, this approach does not consider unnecessary pixel information because it predicts objects pixel by pixel, thereby reducing errors through more accurate object information. Cheng et al. (2022) proposed a deep learning-based framework for monitoring safety compliance among workers. Apart from PPE classification, it incorporated worker re-identification (ReID) by designing a new similarity loss function that helps models learn more discriminative human features, enhancing worker tracking. Alassaf and Said (2024) proposed the Deformable Perspective Perception Network (DPPNet), an automated system based on computer vision to address challenges such as small object miss-detection and occluded helmet detection. DPPNet consists of two modules: Background/Image Spatial Fusion (BISF) and Grayscale Background Subtraction (GBS). The BISF module uses channel attention to blend feature maps from the current frame and the background, while the GBS module integrates background spatial information into the current frame. The results showed that the proposed modules significantly enhance the detection capabilities for small objects.

In addition to accuracy, frames per second (FPS) metric is regarded as the most important indicator of procedure applicability. Since metrics differ from study to study, and different datasets and hardware were used, direct comparison by using accuracy metrics would not lead to objective conclusions. However, several studies reported very high accuracy for PPE detection, exceeding 95% (Tran et al. 2019; Chen and Demachi 2020; Delhi et al. 2020; Jin et al. 2021; Vukicevic et al. 2022; Ullah et al. 2022; Goyal et al. 2022; Habib et al. 2022; Gupta et al. 2023; Pei et al. 2024). For the studies summarized in Table 1, the FPS ranged from 2 to 65. Object detection approaches based solely on YOLO generally achieve higher FPS (Roy et al. 2020; Cheng et al. 2022; Farooq et al. 2023), whereas ROI classification methods result in lower FPS due to the additional processing steps required (including human detection, pose estimation, and region cropping) (Chen and Demachi 2020; Isailovic et al. 2021; Vukicevic et al. 2022). According to Ren et al. (2015), a low FPS limits the real-time applicability of several of these models. Furthermore, the presented FPS rates should be taken with caution, because the hardware in studies ranged from personal computers to workstations with high-end graphic cards. In addition, the model size may be problematic in some cases, especially if the inference is conducted on client devices. Several studies suggest the use of SSD or YOLO models, which, as the backbone of the algorithm, use CNNs that exceed 100 MB in size, e.g., CSPDarknet53 (Protik et al. 2021) or VGG16 (Wu et al. 2019; Guo et al. 2020).

The success of AI depends heavily on the availability of large amounts of labeled training data. Since PPE compliance is a domain-specific problem, companies are expected to have investments in data collection and retraining of CV algorithms to recognize the particular PPE type and design. In the literature, images used for AI training are crawled from various Internet sources (Rubaiyat et al. 2016), collected in laboratory conditions (Balakreshnan et al. 2020), from surveillance cameras (Li et al. 2018), or a combination of these approaches (Chen and Demachi 2020). The reported dataset sizes range from at least 1.000 (Nath et al. 2020; Loey et al. 2021; Protik et al. 2021; Ferdous and Ahsan 2022; Zeng et al. 2023; Lee et al. 2023b; Pei et al. 2024) to 100.000 images (Fang et al. 2018). Studies that used a larger number of images or used a combination of publicly available datasets supplemented with images collected in laboratory or field conditions are considered more effective since they provide improved robustness and generalization capabilities of the proposed algorithms. We emphasize that several public domain-specific datasets are frequently used. For medical mask detection, the common starting points are the Medical Mask, Face Mask, and Face Mask Detection datasets from Kaggle (Loey et al. 2021; Nagrath et al. 2021; Goyal et al. 2022; Habib et al. 2022; Kumar et al. 2022a; Gupta et al. 2023) and their mixtures, such as MOXA for medical mask detection (Roy et al. 2020). The GDUT-HWD (Wu et al. 2019) and Roboflow (roboflow 2020) hard hat datasets may also be useful for the safety helmet detection dataset, whereas Pictor-v3 (Nath et al. 2020; b; Shahin et al. 2023) contains images of workers wearing various PPE. Some studies initially used datasets developed for similar purposes (e.g., DeepFashion2) to train models, and then employed a PPE-specific dataset for fine-tuning (Truong et al. 2020).

Practice has shown that DL models are not easily interchangeable between software libraries and frameworks, although several promising attempts have been made (e.g., ONNX) (Lin et al. 2019). For instance, in (Isailovic et al. 2021; Jin et al. 2021; Ferdous and Ahsan 2022; Vukicevic et al. 2022; Farooq et al. 2023; Zeng et al. 2023; Alassaf and Said 2024), the authors used the PyTorch framework, whereas in (Balakreshnan et al. 2020; Chen and Demachi 2020; Delhi et al. 2020; Loey et al. 2021; Nagrath et al. 2021; Protik et al. 2021; Goyal et al. 2022; Habib et al. 2022; Kumar et al. 2022a) TensorFlow was used. In recent years, significantly fewer models have been built using Caffe (Wu et al. 2019) and Matlab (Loey et al. 2021).

Table 1 A comparative summary of computer vision studies from various perspectives: high-level approaches, algorithms, datasets, environments, and framework

4 The concept and workflow of CV-based PPE compliance 4.0 in industry practice

The concept of CV-based PPE Compliance 4.0, which is based on the foundations of Industry 4.0 technological pillars to digitalize the tasks explained in Figs. 1 and 2, is illustrated in Fig. 6.

In order to timely and objectively detect and prevent PPE misuse, a solution needs to include three key factors: PPE, human, and company. The dynamic nature of today’s manufacturing systems, which makes PPE management highly complex, could be explained through the sample Factory X shown in Fig. 6. As may be noted, many differences in PPE recommendations may exist across both space (company sectors A, B, C, etc.) and time (shifts, emergency periods - e.g., caused by a pandemic). Finally, in addition to these spatiotemporal variabilities, additional complexity arises from the high fluctuations of individuals across sectors, of which we consider three major roles (Fig. 6): (1) visitors, (2) supervisors, and (3) employees (where the use of PPE is commonly different for each role).

In emergencies, such as the Fukushima 2011 disaster, the inability to manage the listed complexities in a timely manner was determined to be among the causes of serious injuries and harm (Chen and Demachi 2020); whereas a clear example of PPE misuse is the improper use of Covid-19 masks (Machida et al. 2020). Additionally, different types of PPE compliance (Fig. 2b-e) must be periodically ensured. The frequencies for PPE compliance checks can vary, including daily, weekly, per-shift, permanent, and randomly scheduled audits.

For these purposes, a digitalized reporting system is needed to provide practitioners with efficient and iterative PPE reporting and mitigation of PPE misuse. A digitalized PPE compliance report needs to contain information about the location, PPE type misused, and mitigation instructions given by the safety manager, who is trained and responsible for PPE compliance (Vukicevic et al. 2022). By enabling data-driven PPE management, a company would be able to work towards continuous improvement of safety awareness and engagement of employees. Key underpinning technologies for such ICT solutions are cloud computing, mobile, and edge devices, which need to be carefully utilized to avoid potential issues explained later in Sect. 5.

Fig. 6
figure 6

The concept of PPE compliance 4.0. Utilization of technology needs to ease PPE management by covering temporal, spatial, and role-based complexities while accounting for factors that affect the management success

The overall concept of CV-based PPE compliance 4.0, and a corresponding generic workflow that illustrates authors’ interpretation of related challenges is shown in Fig. 7. The four major tasks include: (1) PPE audit, (2) PPE compliance, (3) issue mitigation, and (4) training and certification. The two key user roles are employees and supervisors. Because some of the tasks may be performed iteratively, a messaging/correspondence system needs to be integrated within all four tasks. When a user confirms their identity (e.g., with an RFID tag), it is assumed that an edge-AI device performs the PPE compliance check, while the PPE compliance results are passed to the central web server. In the case that an employee has not worn a recommended PPE, the system creates an “open issue”. Notifications about open issues are sent to the manager, which responsibility is to create a new mitigation task and provide instructions to the employee on how to resolve the task. In addition to PPE compliance, an important feature of the proposed concept is the PPE audit system, as poor conditions of safety equipment are closely related to incidents at workplaces.

Fig. 7
figure 7

The generic workflow of computer vision-driven PPE compliance 4.0

As briefly illustrated in this section, the ICT solutions for digitalized PPE management and PPE compliance includes multiple factors, tasks, and technologies, making it significantly more complex than sole PPE detection or image classification. Therefore, there is a need to unify and verify solutions proposed in the scientific literature under real-world conditions adhering to the concept of PPE 4.0 described in this section.

5 Discussion of challenges for further advances and wider applications

Considering the ongoing advancements reviewed in Sect. 3, and the domain complexities explained in Sect. 1 and Sect. 4, this section highlights the challenges identified as key barriers for further development and wider adoption of existing procedures for digitalized PPE compliance in industry (which is currently limited), despite very high average accuracy reported by all selected studies.

5.1 Industry (sub)sectors and environmental challenges

The Global Industry Classification Standard (GICS) recognizes 11 sectors, 24 groups, 69 industries, and 158 sub-industries (Fig. 1b) (MSCI). In some industries, workplaces usually consist of large manufacturing halls with high roofs and uniform lighting conditions. For medium enterprises and smaller halls, lighting conditions may vary during day shifts, which results in the presence of reflections and shadows that negatively affect the appearance and visibility of PPE on imagesFootnote 4. Large environmental variabilities in combination with the complexities explained in Sects. 1 and 4 clearly illustrate the barriers that occur in real-life applications (Figs. 6 and 7). To the best of our knowledge, only a minority of these challenges have been addressed in literature. The compliance of hard hats and yellow vests in construction engineering and Covid-19 facemasks in daylight conditions may be assumed to have been extensively studied. Upcoming studies must also invest more efforts in addressing employee tracking and body parts’ occlusion. This raises the question of determining the optimal camera positioning, resolution, and distances to objects. Potential solutions may include digitalized checkpoints (Fig. 2c), which have been addressed in the literature (Tran et al. 2019; Zhafran et al. 2019; Vukicevic et al. 2022). However, solutions remain to be proposed to ensure that PPE is worn throughout workplaces and during 8-hour working shifts. Thus, two primary research directions may be distinguished: (a) self-checkpoint-based (which operates in restricted spaces and at specific time points), and (b) full-time monitoring systems (which raise the issue of computational costs discussed in the following section).

5.2 Computational costs and complexity challenges

Industrial AI-based solutions may be deployed on-premises (i.e., company-owned local servers) (Chen and Liu 2021), cloud servers (Amazon AWS, Microsoft Azure) (YU et al. 2021), or Edge-AI devices (Firouzi et al. 2022). The trade-off between these three strategies is the cost of buying/maintaining or renting hardwareFootnote 5, whereas developing/buying dedicated edge-AI hardware has recently emerged as the most optimal solution in terms of long-term operational costs (Liu et al. 2022a; Nain et al. 2022). Due to the breakthrough of AI, the global (AI) chips market is constantly growingFootnote 6, resulting in a global chip shortage crisis during 2021–2022 (Sparkes 2021). Regarding the hardware requirements, note that data processing and solution traffic has a high demand, as reviewed in Table 1. Because the resolution of current cameras is 4 + megapixels, the amount of data that must be constantly streamed/processed represents a barrier for the direct application of most currently available solutions. When deploying real-time computer vision solutions, the optimal position and the number of cameras that cover a workplace must be determined, which is a nontrivial problem in an industrial environment, as the priority is to ensure the mobility of employees and machines. In these terms, studies that proposed the use of AI for PPE compliance at checkpoints (Fig. 2c) appear to be closer to the market. To reduce the costs of electricity and data transfer, the deployment of “AI as a chip” directly on cameras is a growing trend (Messaoud et al. 2022). Energy-efficient AI chips process images “on the edge” so that data streaming is reduced to metadata (e.g., PPE class, which is a single integer number). Most existing edge-AI applications have assumed that the replacement of developed algorithms is not required (e.g., plate detection in automotive applications) (Feng et al. 2019). For the compliance of PPEs that are constantly used (e.g., hardhats in construction sites), using a conventional Central Processing Units (CPU), Graphical Processing Units (GPU), Tensor Processing Units (TPU), and/or specific integrated circuit (ASIC) chips is appropriate for developing edge-AI devices. However, embedded devices based on field-programmable gate array (FPGA) chips(Badrignans et al. 2008; Véstias et al. 2020) could enable the unique possibility of developing generic edge-AI hardware that may be adopted across various industries and reconfigured remotely (Sukhanov et al. 2007). Having such robust and reconfigurable edge-AI hardware on cameras appears to be the key feature for handling the spatiotemporal complexities of PPE compliance explained in Sect. 1 and Sect. 4.

5.3 Employees identification and identity management issues

Knowing the identity and protecting the privacy of an employee is the primary requirement of an ICT system for managing PPE compliance (Wang et al. 2021). Over the years, technologies for user verification have evolved from traditional text-based methods (password) to identification cards (e.g., RFID (Kelm et al. 2013), Bluetooth (Pisu et al. 2024), QR code, and barcode) and biometrics (e.g., face recognition, iris recognition, fingerprint, voice, and gait recognition) (Barkadehi et al. 2018). Moreover, the existing identification procedures vary in terms of automation and security levels (Wang et al. 2021). Face recognition is a primary identification technique used in everyday practice as well as in related PPE literature (Tran et al. 2019). However, establishing worker identity while relying on surveillance cameras and face detection can be challenging in some industrial environments because of the high level of face occlusion with different types of PPE, such as masks, hard hats, and glasses (Azeem et al. 2014). In these situations, multiple approaches can be applied to increase reliability by incorporating gait recognition (Khan et al. 2021). Gait identification is less sensitive to occlusion and can provide higher rates of identification in combination with face recognition (Kumar et al. 2021b). Alternatively, visual markers (QR codes or barcodes), which may be of various shapes (Araar et al. 2021) and colors (DeGol et al. 2017), could be attached to worker equipment (e.g., jackets or hard hats) (ISO/IEC 18004 2015). Relying on markers has limitations related to low frame rate, occlusion, sensitivity to marker bending, etc. (Jafri et al. 2014). Additionally, the readability of visual markers was shown to be sensitive to security challenges (Focardi et al. 2019), which is the primary reason they have been used for inventory and less for managing personal data. Some alternate prominent authentication approaches based on combining computer vision techniques (Siyu 2012; Araar et al. 2021), DL detectors (Ciaparrone et al. 2020), and wearable devices are also available (Liu et al. 2022b). It is worth mentioning that PPE with passive or active markers could be a part of an IoT-focused strategy to recognize PPE compliance at the entrance to a specific section of the workplace. Readers installed at the entrance to each section can communicate the compliance, presence, or absence of PPE on the worker.

In general, reducing the complexity of the authentication task is appropriate by assuming that this must be performed primarily at check-in points (entry places), where workers can be identified in controlled conditions–neither by using biometrics (face recognition, iris, fingerprint) nor any kind of physical device (RFID tags, mobile phones). When trading off between complexity, cost, and reliability, RFID technology still sets the gold standard for employee identification, as RFID is assumed to be already in use in most modern companies. After authentication, a requirement may be to track employee movement across the workplace (Germa et al. 2010; Kong et al. 2021). With an emphasis on the computational costs explained in Sect. 5.2, we review artificial intelligence algorithms for object tracking (Luo et al. 2021). Although object tracking is an extensively studied topic in artificial intelligence, tracking humans comes with the data privacy issues discussed in Sect. 5.4. Regarding identity management, while performing employee authentication and tracking, the use of blockchain technology for addressing continuous authentication is a growing trend (Mohsin et al. 2019; Al-Naji and Zagrouba 2020; Minovic et al. 2022).

5.4 Ethical and cybersecurity aspects of digitalized PPE compliance

This study assumes that further progress on this topic relies on computer vision and surveillance technology. Although IP cameras have been established as an everyday experience, the breakthrough of ICT and AI facilitates disruptive identity-related actions (Wang and Tucker 2021). Identity data needed for PPE compliance, which are vulnerable to fraud, are considered highly sensitive (Sule et al. 2021). Despite the existence of regulations such as the European Union’s General Data Protection Regulation (GDPRFootnote 7), that aim to oversee the processing of identity data, existing cybersecurity tools tend to be either application-specific or only partially compliant with these regulatory standards (Rhahla et al. 2021; Ruohonen and Hjerppe 2022). PPE compliance can be considered an IoT and user-centric ICT system (Kounoudes and Kapitsaki 2020), which must operate in accordance with the following privacy-preserving steps. First, workers must be properly informed about the collection and processing of their data. Identity data used for PPE compliance must be confidential and stored for a limited time until its purpose is fulfilled (PPE compliance for a particular user at a particular moment). Automatic erasure is the best course of action. To minimize the possibility of abuse, anonymizing identification should be performed, meaning that the system should track identities without knowing personal data. Although a sensitive topic, determining the trade-off between surveillance privacy/security and digitalization of workplace safety is a promising research direction (Patil et al. 2014; Ring 2016; Sullivan 2017). In these terms, PPE compliance may be considered among the first workplace safety problems that could be successfully digitized using the technological pillars in Industry 4.0. Additionally, regarding the EU region, we emphasize that at the end of 2023, the European Parliament proposed the Artificial Intelligence Act to regulate foundation models and prohibit biometric categorization systems using sensitive characteristics (e.g., politics, religion, or sexual orientation), untagged scraping of facial images from the internet or CCTV footage to create databases, and systems that manipulate human behavior to circumvent free will (European Parliament 2023).

5.5 Limitations

A few limitations need to be mentioned, since the present study includes articles retrieved from only three databases (ScienceDirect, IEEE Xplore, and Web of Science) which are considered the most comprehensive and widely used in the scientific community globally. All studies included were published in English. The review could be improved further by adding literature published in other languages. Even though every record was screened by at least two reviewers, there is a risk of bias in the conducted data extraction (some error might be introduced). Nevertheless, the authors of this review are confident that none of these limitations would change the overall conclusions of this systematic review.

5.6 Future research directions towards reaching the CV-based PPE Compliance 4.0

Potential future developments in CV-based PPE Compliance 4.0 are poised to address existing challenges and enhance the development of generic solutions that will be robust to variabilities of real-world industrial environments. A primary focus will be on advancing existing AI algorithms to address issues related to occlusion, variable lighting conditions, and large-scale monitoring in diverse industry environments. The transition towards edge-AI and edge-computing will also play a crucial role by enabling real-time applications at the point of data capture, thus reducing latency and enhancing the responsiveness of PPE compliance systems. Future studies also need to consider PPE compliance alongside employee identification (as safety managers need to be aware of the identity of a particular employee, not only to detect PPE misuse). Ensuring robust authentication and privacy protection remains essential, which opens possibilities for the integration of technologies like RFID and biometric systems. The development of privacy-preserving techniques, such as data anonymization and secure data handling practices, will be vital to safeguard sensitive personal information while maintaining PPE compliance with regulations. Therefore, adapting to evolving regulatory and ethical frameworks, such as the GDPR and the recently proposed Artificial Intelligence Act (European Parliament 2023), will be crucial for aligning technological innovations with current legal standards and ethical norms nowadays.

By addressing these key areas, future research and development will drive significant improvements in workplace safety and safety management, which could result in huge positive impacts on a global scale, as illustrated in the Introduction section. To do so, the current critical bottleneck from the viewpoint of AI scientists is the lack of public and representative data sets, which need to account for the spatial, temporal, and functional variabilities illustrated in Fig. 6. From the authors’ viewpoint, the optimal way to cover these aspects and generate a sufficient amount of annotated data is by using synthetic data that could be generated using game engines (e.g., Blender, Unity, Unreal) or specialized synthetic data generators such as NVIDIA Omniverse. We emphasize that PPE compliance is a very complex and multi-domain problem, which simultaneously encompasses identification, tracking, pose estimation, object recognition, tracking, and classification in a dynamic environment and in a temporal manner. As such, upon the development of representative public data sets, PPE compliance could gain more attention from the CV community and serve as a representative challenge for further advances in various AI topics.

6 Conclusion

The rapid progress of artificial intelligence shows high potential to digitalize the task of visual PPE compliance, which could result in significant improvement of workplace safety that currently relies on manual supervising and reporting. However, despite the increased number of related studies (correlated with the Covid-19 pandemic), it is found that the technology from academic research is not widely applied in industry practice.

This study enhances this progress by providing a critical review of the topic from 4 various perspectives (formulated as 4 key questions in Sect. 1), which highlighted both technological advances as well as domain-specific barriers that were underestimated in academic literature. Starting from the detailed illustration of the workplace safety requirements, this survey contributes to the field by (1) reviewing high-level approaches, frameworks, and data sets used in the literature, and (2) providing an in-depth discussion of various challenges that need to be considered and solved in future work. Findings from this review indicate that identified challenges (Sect. 5) are related to the underestimation of PPE compliance complexity in dynamic industrial environments, computational costs, the complexity of running the developed AI algorithms on the edge and cloud, as well as employee identification and identity management, and ethical and cybersecurity issues. Considering the benefits that the Industry 4.0 is projected to bring in OSH safety until 2030 (International SOS Foundation 2018), this review confirms the necessity for further engagement of both academia and industry in solving challenges related to the complexity of workplace health and safety requirements (Schulte et al. 2020).

In summary, it is concluded that the topic of AI-driven PPE compliance continues to evolve toward the transition from cutting-edge research to practical applications. Accordingly, future work on this topic should focus on refining and advancing existing procedures to overcome the highlighted barriers and facilitate wider adoption in various industrial sectors.