1. Introduction
Research on stress detection techniques has become more and more important in recent years. These techniques aim to automatically evaluate stress arising in individuals in relation to their health and emotional condition. In this work, we focus on the problem of automated stress detection in car drivers, since this is of paramount importance to ensure safety and wellness of professionals and people in their everyday life. As a matter of fact, the effects of emotional stress can lead to health problems [
1,
2] and risky behavior [
3,
4].
The analysis of physiological signals, as well as of physical/behavioral data, can be an effective solution for automatic stress assessment [
5]. The use of Machine Learning (ML) techniques has also been proven to be effective for the classification of this kind of signals in order to automatically detect the stress level in subjects performing stress-inducing tasks.
Focusing on car driving scenarios, authors in [
6,
7], for instance, analyzed different physiological signals, i.e., Electrodermal Activity (EDA) [
8], electrocardiogram (ECG), and respiration, to recognize stress experienced while driving. In more detail, in [
6], the results of four different classifiers (i.e., support vector machines, decision trees, naive Bayes, and general Bayesian classifiers) are discussed. Only one subject was used for the experiments, and the data were collected in real driving conditions over several months. In [
7], the same physiological signals are considered, but this time using 10 s windows, and the stress is still classified in two different classes. A Bayesian network was used, achieving a stress event detection accuracy equal to 82% by considering only physiological signals logged in real driving conditions. The inclusion of additional data, such as information of the current driving environment and vehicle data, as well as driving behavior history, allows the system to achieve a higher stress event detection accuracy (96%). The ground truth is determined by the drivers themselves, by stating the perceived stress level for each driving task.
ECG signals are considered in [
9], where they are processed through a Support Vector Machine (SVM), a k-Nearest Neighbor (k-NN) model, and a Random Forest (RF), to evaluate how driving by following Global Positioning System (GPS) indications can affect driver’s emotions in both positive and negative ways. Another task that can modify the performance of subjects is the takeover request (TOR) in a semi-autonomous driven car. This happens in a level 3 driving automation when the driver needs to take charge of the car. Authors in [
10] evaluate the effects of TOR phases by analyzing the driver’s gaze, Heart Rate (HR), Galvanic Skin Response (GSR), and facial expression. In [
11], a Hidden Markov Model (HMM), which uses speed and distance between vehicles, is employed to estimate a driver’s behavior with specific induced emotions. A study that compares the car driver’s stress in a simulated autonomous and manual driving scenario, using an SVM algorithm, was introduced in [
12]. The experiment was conducted using a professional driving simulator and the subjects were asked to drive both in manual and autonomous driving scenarios. The results presented demonstrate that the subjects generally appear more stressed during manual driving, proving that autonomous driving can be positively perceived by the general population. The impact of different car handling setups, e.g., understeering and oversteering, was evaluated in [
13]. The goal here was to assess the driver’s response to various car settings, in order to identify a professional driver’s favorite setup or to find the most suitable setup for the majority of drivers.
Other approaches based on the use of physiological signals that do not employ ML algorithms are also proposed in literature. As a few examples, in [
14], a self-similarity analysis of EDA using a wavelet-based approach is presented to evaluate the stress levels in subjects during a real-world driving experiment. In this case, only the EDA signals logged from both the foot and hand of the test subjects are considered. These signals, coming from a public dataset composed of recordings of real-world driving, are segmented into 3-minute time windows before being analyzed using self-similar processes. The authors of [
15] examined ECG and eye activity, in addition to other subjective data and performance measures derived from the driving simulation, to evaluate the effects of mental workload on drivers, and also proposed a protocol to assess workload during both real and simulated tasks.
In this paper, we analyze the physiological responses of subjects as measured by EDA and ECG signals while they are driving in a simulated urban scenario. When subjects find themselves in a situation of high arousal or under stress, the stimuli generated by the Autonomic Nervous System (ANS) activity as a reaction to these situations can be evaluated through the analysis of EDA [
8]. In response to the same stimuli, various parameters of the ECG signal such as Heart Rate (HR) and Heart Rate Variability (HRV) are affected. The objective of our study is to evaluate the impact of traffic conditions on drivers and develop a relatively noninvasive system that can automatically quantify the overall stress level. As a first contribution of the proposed system, we utilize the Skin Potential Response (SPR) EDA signal, and not the more commonly used Skin Conductance Response (SCR) signal. This is different from what has mostly been done in the literature, for instance in [
6,
7,
10]. As a matter of fact, SPR appears to be less sensitive to the impedance of electrodes and to slow changes of skin impedance. SPR can be recorded easily without the need to apply current to the subjects and is commonly characterized by a quick response to stress stimuli, as opposed to SCR [
8,
16].
The proposed system acquires two SPR signals from the hands of a given group of subjects, as well as the ECG from their chest. Another key contribution of the proposed system is the adoption of a specific procedure to mitigate the problem of Motion Artifacts (MAs). A MA removal algorithm is applied to the SPR signals in order to remove spurious patterns due to hand movements while driving. As described below, it combines the SPR signals acquired from the two hands in order to produce a cleaned SPR signal for further processing. A set of significant features are extracted from 15 s overlapping signal blocks and used as an input to different ML classifiers, whose output is a label that discriminates between the “stress” (or “1”) and “non-stress” (or “0”) classes. The stress level over a certain time span can then be quantified by the number of “1” labels in that interval. As for the classifiers, they were trained on a large dataset of ECG and SPR signals collected in an experiment carried out in a firm specialized in car driving simulators. The trained models were then used for classification of a new collection of signals acquired from subjects during an experiment with urban simulated scenarios, characterized by the presence or absence of traffic. The new experiment involved University of Udine students and was performed on a different car simulator in our university lab; therefore, the training and test data come from different setups.
In order to better characterize the peculiar behavior of the SPR signal in this work, we also perform an offline evaluation using scalograms, which provides a time-frequency analysis of the signal [
17,
18]. This is an additional way to investigate the behavior of the SPR signal in the two different scenarios as a measure of the emotional responses of the subjects while driving. The computation of the SPR scalograms requires knowledge of the entire SPR signal logged from the subjects for each scenario and it is not computed in real time. On the other hand, the application of the ML classifiers is carried out in each 15 s interval, with a new interval chosen every 5 s. The selected features from both SPR and ECG signals contain extensive information about the signals, and are extracted from each 15 s interval. In this way, the proposed classifiers can operate in real time, on the basis of more complete information than that provided by the scalogram.
Here, we extend the work presented in [
19], where we considered only the EDA signal for classification, and only the results obtained using an SVM classifier were discussed. Moreover, our work extends the results of [
12,
13], where we used a similar approach, but in different setups and scenarios (as introduced before) compared to this study. In summary, our work proposes a complete system to evaluate traffic-related stress in drivers, including the sensor design for SPR signals, the MA removal algorithm, and the setup of all the parameters of the ML architecture. Our results suggest that the proposed system, which can be implemented in real time and with limited detection delay, is effective in discriminating the effect of traffic on different subjects.
The paper is structured as follows. The next section introduces the fundamental blocks of the proposed scheme. We describe the sensor we used for the acquisition of both the SPR and ECG signals. We then examine the MA removal algorithm that allows us to reduce possible motion artifacts arising in the SPR signals acquisition (
Section 2.1), and the ML algorithms used for stress classification (
Section 2.3). The experimental setup is described in
Section 2.4. For signal analysis and interpretation of the SPR signal, we propose the use of scalograms, with an overview of the scalogram representation provided in
Section 2.2.
Section 3 discusses the experimental results of our study, first considering the SPR scalograms and then considering both the EDA and ECG signals as a measure of the subjects’ stress response. The results in terms of positive intervals (with stress) obtained when considering the different ML models are included in this section. At the end, some conclusions are drawn (see
Section 4).
3. Results and Discussion
In this section, we will discuss the experimental results obtained considering the scalogram of the SPR signals. As mentioned, this analysis provides an overall off-line evaluation, which shows the signal differences in the two driving scenarios. We will then present the results considering both the SPR and ECG signals logged from the subjects during the course in the two different urban scenarios (with and without traffic), comparing the classification performance obtained with the various ML algorithms.
3.1. SPR Scalogram Analysis
A first assessment considering only the SPR signal was carried out to evaluate if the physiological responses of the subjects to traffic situations can be analyzed through the scalogram. In our work, we analyzed the scalograms of the cleaned SPR signals of the ten subjects in both the traffic and non-traffic scenarios. For each subject, the signals were normalized by subtracting the mean and dividing the result by the standard deviation of the concatenation of the signals obtained in the two driving scenarios, for that subject. This way, we can fairly compare the different subjects in their response to the test, as well as the response of each driver to the two scenarios.
To compute the scalogram with the Matlab routines, we set the voices per octave, which is a parameter used to discretize the CWT scale values, to 12. We also specified the sampling frequency of the signal so that a scale-to-frequency conversion is carried out, giving back the frequencies measured in Hz (actually, these are pseudo-frequencies associated to the scale values, since there is not an exact relationship between scale and frequency). One way to do this is by determining, for each value of the scale a, the center frequency of the wavelet in Hz, identifying its peak value in the frequency domain.
Figure 4 displays the SPR scalograms obtained for Subject 1. We show the scalogram in the non-traffic scenario at the top, and the traffic one at the bottom. The presence of several high-magnitude episodes in the traffic scenario is clearly evident, as opposed to the non-traffic scenario, where a few low-magnitude episodes occur (besides one single high-magnitude episode, at the beginning of the experiment, that could be due to a temporary state of excitement for taking the test).
Considering all of the subjects’ scalograms, we notice that the maximum magnitudes of the CWT coefficients almost always appear in the [0.03, 1] Hz range. Therefore, for each subject and each time instance, we compute the sum of the squared absolute values (i.e., the energy) of the CWT coefficients in this frequency range. The results are plotted in
Figure 5 and
Figure 6 for Subjects 1 and 8, showing the energy computed along the track.
For each subject, we then calculate the mean of the energy for the entire duration of the experiment, in the two different situations. The results are reported in
Table 2.
The differences between these values are also included in the table. We can see that, considering the mean value of the energy of the CWT coefficients, two subjects out of ten exhibited a higher energy in the traffic-free situation. The scalograms for one of these two subjects, i.e., Subject 4, are shown in
Figure 7. In this case, the large number of high-magnitude episodes appearing in the non-traffic scenario opposed to the traffic scenario is clearly noticeable. Our evaluation of the SPR scalograms suggests that these two subjects appear to experience more stress in the non-traffic scenario and this is confirmed by the results obtained with the ML classifiers using both the SPR and ECG signals (see the next section). We can also note that Subjects 3 and 4 drove in the non-traffic scenario first, and this may have influenced their emotional responses.
3.2. ML Classification Results
As shown in
Figure 1, the cleaned SPR signal (after the MA removal block) and the ECG signal were then processed to calculate the feature vectors for the classifiers. Regarding the ECG, we extract the R peak locations through the Pan–Tompkins algorithm [
29] and we correct the ectopic beats as in [
30]. We then derive the RR signal with equidistant sampling by interpolating the non-equidistantly sampled RR interval time series (through a cubic spline interpolation). In the end, we normalize all of the RR signals to make them comparable among the different subjects (using the same procedure described for the SPR signal).
The SPR and RR signals are then sent to the three binary ML classification algorithms presented previously, which are now only used for testing. Considering each subject driving in the two situations (with and without traffic), for each 15 s interval, we are able to compute the eight features described in
Section 2.3. As already mentioned, a new interval is picked 5 s after the previous one, meaning that there is a 10 s overlap between successive intervals. The SVM, RF, and DT classifiers, by analyzing these features, output a label for each 15 s interval that indicates whether the interval is classified in the “stress” or “non-stress” class. In this way, we are able to calculate for each subject and each driving situation the final number of labels equal to “1” or “0”, i.e., the total number of intervals that the classifiers identify as “stress” or “non-stress”.
Table 3 displays the percentage of the intervals marked as “1” (or “stress” intervals) observed for each classifier and for each subject, taking into account the complete track in the two different situations.
As an example,
Figure 8 graphically depicts the values indicated in
Table 3 for the SVM classifier. Looking at this figure, but also considering the data reported in
Table 3, we can observe that the driving situation with traffic seems to generate more stress when compared to the driving situation without traffic for the majority of the subjects. In particular, according to the results obtained with the SVM classifier, three subjects out of ten (i.e., Subjects 3, 4, and 10) seem to experience more stress when dealing with the non-traffic scenario. Note that two of them are the same ones that appear to be more stressed in the non-traffic scenario by analyzing their scalogram, and that the difference between the two scenarios for Subject 10 is very small. For the RF classifier, two subjects (i.e., Subjects 4 and 10) appear to be more stressed in the non-traffic scenario, whereas for the DT classifier, four subjects (i.e., Subjects 2, 3, 4, and 10) appear to be more stressed in the non-traffic scenario (even if the difference between the percentage of the positive labels in the traffic scenario and the ones in the non-traffic scenario for Subjects 2 and 3 is again very small). Note that the order in which the subjects conducted the experiment (starting with the non-traffic situation and continuing with the traffic situation, or vice versa) could also have influenced the results in terms of the physiological reaction arising in the subjects. Comparing these results to those presented in [
19], where only the SPR signal was considered, we observe that the inclusion of the ECG features may not have significantly affected the classification procedure. As an example,
Figure 9 shows the HR for Subject 10, with no clear difference between the signal characteristics in the two scenarios. In an urban scenario more effort is needed to drive, so we believe that this subject’s physical activity could have masked the variability of ECG associated with stress episodes, making them less detectable. For the same reason, we did not include the root mean square of the subsequent RR interval differences (RMSSD) feature, which is often used for classification [
21]. RMSSD is in fact useful to discriminate episodes corresponding to increasing or decreasing HR, which we did not observe in the experiment.
Even if the scope of this work is to compare the emotional responses of the subjects while driving in the entire track with and without traffic, in order to have an even more complete overview of the subjects’ stress reactions, we examined the highway and city route sections separately. We report in
Table 4 the total number of intervals marked as “stress” by the SVM classifier (in %) for each subject and each road section. We can note that, on average, the highway section appears to be more stressful than the city route section, in particular in the traffic scenario.
Considering again the course as a whole (highway and city route together), a further analysis was carried out considering a statistical non-parametric test (Wilcoxon signed rank test) comparing the data for the ten subjects in the two situations, traffic and non-traffic. By using the ten values representing the positive label percentage of the subjects in the traffic scenario and the ten values representing the positive labels of the subjects in the non-traffic scenario as input, this statistical test provides a p-value equal to considering the SVM classifier, for the RF classifier, and for the DT classifier. This confirms that, for each classifier, there is a significant difference between the number of positive labels collected in the two scenarios or, in other words, the two scenarios do cause a different emotional response in the test subjects.
In
Figure 10, we show the output of the SVM for Subject 6 (in the two scenarios), where the difference between the positive labels in the traffic and non-traffic situation is among the highest positive ones we obtain. We only show the positive labels (denoted as “1”) using a grey stem, located at the end of the corresponding 15 s classified interval. For simplicity’s sake, the labels of the intervals classified as “non-stress” are not displayed. The cleaned and normalized SPR signals, along with the normalized HR of the subject in the two scenarios, are also shown in the figure. Another example, using the RF classifier is shown in
Figure 11 for Subject 7. In
Figure 12, the output of the SVM classifier for Subject 4 is depicted. This is the case where the difference between the positive labels in the traffic and non-traffic scenario is the greatest, in negative terms, for all the classifiers. Looking at these figures, however, we can see that the classifiers, by analyzing the SPR and ECG signals, are able to properly detect the stress episodes along the course in an urban area.