4.1. Experimental Device
For the experiments, we developed a device to collect PPG signals. The hardware part developed herein consists of two independent PPG types: reflective and transmissive. However, the processor configuration was the same for both cases. We developed a SEN0212-based device that uses white-light LEDs. The wavelength selection of the LED source in the experimental setup was based on the results of our previous study [
13]. The device includes reflective- and transmissive-type PPG signals, and its structure is composed of a reflective LED, a transmissive LED, and a common PD, as shown in
Figure 1. The reason for dividing the LED into the reflective and transmissive types and using a common PD is that the light intensities of the reflective and transmissive type LEDs are set differently. In the case of the SEN0212 used in our study, the light intensities of the reflection and transmission type LEDs are 52 mW and 1 W, respectively.
Unlike the existing model-based methods [
7,
8,
9], the necessity for three light sources is not absolute; hence, results can be derived with one, two, or three light sources. The wavelengths of the LEDs used are red (615 nm), green (525 nm), and blue (465 nm). We used a white-light source with three wavelengths and a PD to receive the signals at the wavelengths of choice using light source filters for each of the wavelengths. Both the reflective and transmissive systems use the same LED wavelengths and a common PD.
The microcontroller used here was an Arduino UNO. The commercial sensor module DFRobot SEN0212 consists of a color sensor (TCS34725) and a set of white LEDs. The TCS34725 model is a sensitive device with band-pass filters at three wavelengths (R, G, B) and an infrared (IR) cut filter. The sensor can be operated with a sampling rate of around 56 Hz using the proposed protocol.
For the reflective-type system, the white-light LEDs and PD are placed side by side, with the PD being used to record the reflected signals. Further, for the transmissive-type system, a separate high-power white-light LED is attached to the opposite side of the PD. The switching device supplies power to only one of the LEDs (transmissive or reflective) according to the “Type Sel” signal from the microcontroller. The “Type Sel” signal automatically changes the mode of the device (transmissive or reflective) every minute. The LED and sensor modules are packaged in the form of a clip and attached to the tip of the finger.
Figure 6 shows the system block diagram of the SEN0212-based device.
4.2. Data Acquisition
After developing the hardware system, the PPG data were collected from subjects. A total of 4 min of reflected and transmitted PPG signals were collected from each subject. A prototype device was built and used to collect the PPG signals required for the experiments, and the data were measured and collected using a Python-based application. The application is designed to easily save the data stream sent from the hardware to the PC easily by automatically managing the volunteer IDs and data files.
For 4 min of signal measurements, the device switches between the transmission and reflection modes at 1 min intervals, thus measuring in the reflection and transmission modes for 2 min each. This ensures that problem-free and sufficient data collection even if external factors (such as a finger movement) temporarily prevent signal reception or cause errors.
Figure 7 shows the signal acquisition scenario used in this study.
Commercial products used to measure the glycated hemoglobin and SpO
2 were used herein to collect reference data for calibration and verification. An invasive Biohermes A1c EZ HbA1c checker was used as the reference device for measuring glycated hemoglobin. A noninvasive Schiller ARGUS OXM plus was used as the reference device for measuring SpO
2, which is additionally required information. In addition to the PPG signals, the subjects’ BMI and FWs were collected. The subjects for the study were recruited through institutional review board (IRB) deliberation (approval number: KMU-202006-HR-237) of Kookmin University, Seoul, Korea, and data were collected according to the IRB procedures. The devices used in the data collection are listed in
Table 1.
Figure 8 shows an image of the measurement and collection of PPG signals using the abovementioned devices. As for the ground truth, HbA1c was measured only once for each subject since it reflects the average blood glucose level over 3 months. For SpO
2, an average of the four-minute measurements was used at a time similar to the time when HbA1c was measured. Data from a total of 40 subjects were obtained under IRB instructions (No: KMU-202006-HR-237), as shown in
Table 2.
4.3. Experimental Results
The Clark error grid analysis (EGA) [
16] was performed to confirm the performance results of the proposed noninvasive glycated hemoglobin estimation method. The entire grid was divided into five regions (A, B, C, D, and E), where “A” represents clinically accurate data, and “B” shows various data outside the range of 20% from the baseline but within allowable range without fear of wrong judgment. The region “C” represents risky data that can lead to erroneous treatment decisions. However, a good agreement could not be confirmed from a high correlation between the two methods. Therefore, as an additional performance verification method, the Bland–Altman (B&A) plot [
17], which shows the correspondence between different data, was used. This method shows the agreement between two different methods for measuring the same parameter. The closer the mean of the difference between the two methods to zero the better. The B&A plot also helps to identify the outliers (outside of the ±1.96 SD line). The total number of data used for training was 40.
The learning algorithms used were RF and XGB. For setting the training environment of both the RF and XGB models, a total of 100 trees were used, and no maximum depth limitations were given. We took advantage of the leave-one-out cross-validation (LOOCV) method to evaluate the performance of our overall model. In this LOOCV method, the training and testing phases are performed in a number of iterations. The number of iterations is equal to the number of volunteers in our dataset (n = 40). In each iteration, one volunteer’s data were kept in the test dataset, and the other volunteers’ data were kept in the training dataset (n = 39). Then, from the test data, 1% of the total signal window was moved to the training dataset. In this process, a total of 40 training-test were performed by changing the testing subject.
After deriving results using the EGA and B&A plots, the performance was verified through error analysis. Since it is possible to use 1–3 different wavelengths for glycated hemoglobin estimation with ML, the performance of each wavelength combination was evaluated to find the optimal wavelength combination. Additionally, we checked the results with the addition of FW as a feature, which is an important factor. Thereafter, performance comparisons were performed methods based on our previous model. In
Table 3, the Pearson correlation coefficients (Pearson’s r) are listed for all wavelength combinations.
From the performance results of each wavelength combination for both RF and XGB algorithms, the case with all three (RGB) wavelengths was observed to be the best. However, this has the disadvantage that the maximum wavelength combination (three wavelengths) is used. Notably, the reflection-type XGB algorithm showed relatively higher performance, and both RGB and RG combinations had the same Pearson’s r values. This is consistent with the results of a previous study indicating that there are differences in the absorption of oxyhemoglobin and deoxyhemoglobin at each wavelength and that the RG combination is the best among two-wavelength combinations [
18]. Hence, the RG wavelength combination was finally selected. Overall, by considering both performance and the number of wavelengths required, the RG combination was selected as the most efficient choice.
Performance comparisons were also performed when a total of 18 features were used, by adding the FW feature to the 17 features used in a previous blood-glucose estimation study [
10], and when only 7 features were used after excluding unnecessary features.
Table 4 shows the performance comparison for the cases using 18 and 7 features after excluding low-importance features. The case with 7 features showed better performance than that when all 18 features were used; the reason for this is that some of the existing features with significantly lower importance values may actually cause learning errors. In fact, from checking the results of feature importance analysis (
Figure 3 and
Figure 4), it was observed that some features had remarkably low importance values.
For error analysis, the differences in standard deviation (Diff STD), mean-squared error (MSE), mean error (ME), mean absolute deviation (MAD), root-mean-squared error (RMSE),
R2 score, and Pearson’s r were used. Then, we confirmed the performance differences based on whether the FW was used as one of the features. The FW feature has not been used in the previous study [
10]; however, this feature is an important parameter for glycated hemoglobin estimation and may also be an important parameter for estimating blood glucose levels. FW can be useful for predicting the degrees of absorption and reflection of the wavelengths (light) irradiated onto the finger in both transmission- and reflection-type systems.
Figure 9 and
Figure 10 show the EGA and B&A plots, respectively, when the FW was not used as a feature.
Figure 11 and
Figure 12 show the EGA and B&A plots, respectively, when the FW was used as a feature.
The importance of FW as a feature can be visualized in
Figure 9,
Figure 10,
Figure 11 and
Figure 12. In
Figure 9, we observe that the fitted line deviates considerably from the ideal linear line and that a good number of predicted data points are in region “B”. In the B&A plot analysis for the same case, we can again see that the mean difference between data from the proposed prediction model and reference device data is larger and that is why most of the data fall outside the mean line. Nevertheless, using FW in the feature set, we find the opposite scenario, which is depicted in
Figure 11 and
Figure 12.
From the error analysis results with and without FW as a feature, it was confirmed that the performance can be greatly improved when FW is used. This is because FW determines the distance between the transmitter and receiver, thereby affecting both the transmission- and reflection-type systems. Thus, the FW can be useful information for predicting the degrees of absorption and reflection of the wavelengths (light) irradiated on the finger. In addition, the finger-clip-type device works based on the pressure between the finger and sensor according to the finger thickness. The thicker the finger, the greater the pressure. Therefore, FW causes a difference in the pressure applied to the sensor, which significantly affects the signal; hence, it was confirmed that the performance is improved when the FW information is used as a feature.
Table 5 shows the error analysis results with and without using the FW feature. In this analysis, we again confirmed the importance of FW through quantitative measurements. An XGB regressor with the FW feature works best except for the “mean error (ME)” metric. From the error analysis results with and without FW as a feature, it was confirmed that the performance can be greatly improved when using the FW data. The reason for this is that FW determines the distance between the transmitter and receiver, thereby affecting both the transmission- and reflection-type systems. Thus, the FW can be useful information for predicting the degrees of absorption and reflection of the wavelengths (light) irradiated on the finger. In addition, the finger-clip-type device works based on the pressure between the finger and the sensor according to the finger thickness. The thicker the finger, the greater the pressure. Therefore, the FW causes a difference in the pressure applied to the sensor, which has significant effects on the signal; hence, it was confirmed that the performance is improved when the FW information is used as a feature.
The metrics for examining the performance of the models are the standard deviation of the difference (Diff STD), mean-squared error (MSE), mean error (ME), mean absolute deviation (MAD), root-mean-squared error (RMSE),
R2 score, and Pearson’s r. The equation for the standard deviation of the difference is given below.
From the experimental results, the proposed ML-based glycated hemoglobin estimation method uses both red and green wavelengths and shows the best performance when using ML (XGB) with seven features, including FW.
Table 6 shows the performance comparison of the existing model-based and proposed ML-based glycated hemoglobin estimations. In this table, we can observe the superior performance of the proposed ML-based method. The existing computational models used for comparison rely on the modeling of the human blood components and finger structure. However, for the sake of simplicity, many complex phenomena are omitted from these models. On the other hand, machine learning methods can find those complex relationships between input features and targets.
To evaluate the classification accuracy, the diabetes status diagnosis performance was confirmed by comparing the glycated hemoglobin estimated values from the proposed method with reference values.
Table 7 shows the performance comparisons for diagnosing diabetes status, respectively, between the proposed ML-based and our previous model-based methods. The proposed XGBoost (reflection) method shows the best prediction performance; in particular, the most dangerous false-negative area (red area) in the diagnosis had the least results. Thus, the superiority of the proposed method was proved.
Figure 13 and
Table 7 show the confusion matrix and performance comparison for diagnosing diabetes status, respectively, between the proposed ML-based method and our previous model-based methods. In
Figure 13, we can see that all models perform well to classify patients as diabetic, but the photon-diffusion-based reflection model shows the best performance. Prediabetes is also successfully classified; for this classification, Beer–Lambert-based blood vessel model and photon-diffusion (transmission) model do not perform well. It is worth noting that, overall, the proposed XGBoost (reflection) method shows relatively superior prediction performance. Thus, the superiority of the proposed method was proven.
Finally, it was confirmed that the proposed ML-based glycated hemoglobin estimation method shows effective performance for diagnosing diabetes status. From
Table 7, we can see that XGB performs better than other models in terms of diabetes classification accuracy. PPG signal for the red and green light and all features except BMI and SpO
2 combination were used, obtaining 90% accuracy. On the other hand, photon diffusion theory shows the best performance among the previous model-based methods. Thus, the RG (red and green) wavelength combination and seven features including FW were selected for our XGB reflection model.