1. Introduction
Congenital heart disease (CHD) is one the most common birth defects which affect the overall structure of the heart and vessels, found in not more than 1% of newborns [
1]. CHD manifests itself at birth and symptoms may vary from mild asymptomatic cases to severe, life-threatening indications. With advances in treatment, there is an increasing population of adults surviving with congenital heart malformations. Globally, cardiovascular diseases (CVD) are the main cause of mortality. Many adult CHD survivors presenting an increased risk of CVD [
2] may have long term health problems, which affect their quality of life. In Pakistan, CHD remains to be an important medical issue and the number of patients is increasing every day [
3]. Among newborn children and youngsters, cardiac disorders are responsible for a large extent (30% to 50%) of mortality brought about by birth surrenders. The most common cardiac defects which represent about 85% of all congenital heart diseases are ventricular septal defects (VSDs; 34%), and atrial septal defects (ASDs), which contribute up to 13% [
4]. Like any other medical issue, robust diagnosis methods are required for the timely diagnosis of the CHDs. Different non-obtrusive procedures are utilized in identifying heart defects. Using the electrocardiogram (ECG) is one of the most common paths for identifying heart issues; it is based on the electrical signals generated during the heart muscle contraction/relaxation. The ECG reveals the electrical activity of the heart and is mostly recorded by the placement of three electrodes for early diagnosis. It comprises five waves; i.e., P, Q, R, S, and T. These waves are prepared to make sense of different pathologies [
5].
Another commonly used mechanism for diagnosis of heart disorder is through the analysis of the heart sound [
6]. Easy access to digital stethoscopes allows medical staff to record and analyze heart sounds for diagnostic purposes. The phonocardiogram (PCG) records heart sounds and murmurs in the form of a plot and the machine by which these sounds are recorded is known as the phonocardiograph. It is one of the non-obtrusive systems, which records heart condition in audible form. Heart sounds are generated by the opening or closing of the heart valves. Blood flow through the valves’ orifices or into the ventricular chambers also produces heart sounds. Recording of the PCG signal consists of four important heart sound constituents; namely, S1, S2, S3, and S4.
An atrial septal defect (ASD) [
7] is a birth deformity of the heart in which there is a hole in the wall (septum) that isolates the upper chambers (atria) of the heart. A gap can fluctuate in size and requires a medical procedure. The reasons for CHD amongst most infants are obscure, but genetic factors are also important, as a few infants have heart defects as a result of changes in their genes or chromosomes [
8]. A ventricular septal defect (VSD) is an opening in the heart, a typical heart imperfection that is present during childbirth (congenital).
Extensive research has been carried out for the detection and classification of congenital heart disorders using the PCG signal. The PCG signal classification approach was suggested using the nested set of classifiers; namely, random forest, cost-sensitive classifier, and LogitBoost (LB) [
9]. A combination of time domain, statistical, and frequency domain features was used for effective classification. Cepstrum-analysis-based feature extraction was performed to classify normal and abnormal PCG signals through a support vector machine (SVM) classifier [
10].
PCG signal classification was achieved through linear SVM and a combination of dynamic time wrapping (DTW) and Mel-frequency cepstral coefficient (MFCC) features in [
11] to achieve 82.4% accuracy. The screening method of PCG signals using a modified Arash-band method and an SVM classifier has been used [
12]. In [
13], the PCG signal was first segmented into S1, systole, S2, and diastole through the hidden Markov model (HMM). Gammatone frequency cepstral coefficient (GFCC) features were extracted to perform classification using weighted SVM without segmentation and with segmented signals. The sensitivity of 90.3% and specificity of 89% were achieved through 10-fold cross-validation. Rubin et al. [
14] proposed a method for classification of normal and abnormal PCG signals based on Mel-frequency cepstral coefficients (MFCCs) and a two-layer convolutional neural network (CNN). This method achieved an overall score of 83.99% with the PHY16 challenge database. Spectrogram features from PCG were used to train CNN and Adaboost classifiers [
15]. A simple decision rule was implemented on outputs of both classifiers to generate final classification results with an overall reported accuracy of 89%. In another study [
16], the authors used a Hamming filter for noise reduction in PCG signals. A four-layer 1D CNN for PCG signal classification was employed and the overall accuracy of this method was 79%. In a recent study [
17], the CNN architecture was presented for heart sound classification. CNN was tested on different feature sets, such as Mel-Spectrogram, MFCC, and sub-band envelopes.
Zhihai Tu et al. performed filtration of heart sound signals using wavelet transform. Heart sound segmentation was performed using Hilbert transform [
18,
19], and cubic polynomial interpolation [
20]. Samuel E Schmidt et al. presented an easy and cheap system for the identification of coronary artery disease (CAD) using acoustic features. A quadratic discriminant function was used to combine the different features. The accuracy to diagnose the CAD disease is 73% [
21]. In another study [
22], tunable Q-wavelet transformation [
23,
24,
25] and signal second difference with the median filter were used for the detection of artifact in heart sound. In [
26], the classification of heart sound was achieved through power MFCC features fused with fractal features. The nearest neighbor classifier was employed to perform classification. The overall accuracies achieved on three publicly available datasets were 92%, 81%, and 98%. In [
27] heart sounds classification was performed through MFCC and linear predictive coding (LPC) features in conjunction with the Adaboost ensemble classifier. In [
28], the authors used the least square support vector machine (LSSVM) with wavelet features for the detection of heart pathologies. VSD was diagnosed from the time-frequency feature matrix acquired from heart sounds [
29]. The ellipse-based model achieved max accuracy of 97.6% on large VSD sounds. The authors used the auscultation jacket to detect heart abnormalities [
30]. The system with a feed-forward neural network as the classifier achieved sensitivity and specificity of 84% and 86% respectively. In [
31], normal and abnormal cardiac sounds were classified using ensemble EMD, auto-regressive models, and a neural network. The method showed sensitivity and specificity of 82% and 88% respectively. An efficient method for the detection of abnormal PCG signals was proposed [
32] using MFCCs and SVM with a classification accuracy of 92.6%. Classification of CAD and non-CAD subjects from PCG and ECG [
33] using a dual input neural network (DINN) achieved specificity, accuracy, and G-mean of 89.17%, 95.62%, and 93.69%, respectively. A combination of machine learning and a deep learning model [
34] for identification of congestive heart failure (CHF) from audio PCG obtained an accuracy of 93.2%.
Classification of ASD and normal PCG signals collected from newborn subjects was performed using a combination of short-time Fourier transform (STFT) and MFCC and its derivatives features [
35]. Accuracy of 93.2% was achieved through the KNN classifier. An approach based on discrete wavelet transform (DWT) and multilayer perceptron (MLP) for estimation of VSD were presented in [
36]. Features such as power, standard deviation, skewness, kurtosis, and Shannon entropy were extracted from eight levels of detailed coefficients of DWT. In another similar study [
37], a combination of wavelet and MFCC features was proposed to achieve 97% accuracy on normal and four abnormal classes of heart sounds. In [
38], a comparative analysis of four features reduction methods for PCG signals is presented. Experiments were performed on normal patients, and those with three different classes of heart disorders; namely, ASD, VSD and AS. Double discriminant embedding (DDE), feature space discriminant analysis (FSDA), clustering-based feature extraction (CBEF), and feature extracting using attraction points (FEUAP) were used with a KNN classifier.
Table 1 presents a comparative summary of existing literature in terms of feature extraction and classification methods and the number of classes used in the experimentation.
In the present research, a novel method for PCG signal analysis for the detection and classification of congenital heart diseases is presented. Classification of ASD and VSD based on PCG signals is targeted using empirical mode decomposition (EMD) and a fusion of MFCC and temporal features. Specifically, a new feature fusion-based approach for the classification of ASD and VSD using PCG signal analysis is proposed. The classification performances of MFCCs and temporal features 1D local texture patterns (1D-LTPs) were individually evaluated and followed by the evaluation over the proposed fused feature representation. The proposed method was shown to be accurate, reliable, and robust due to comprehensive PCG signal representation with reduced features.
The rest of this article is organized as follows.
Section 2 describes details about the data acquisition and the proposed methodology.
Section 3 presents results of detection and multiclass experiments. A comparative analysis of this work with previous studies is presented in
Section 4. In
Section 5, conclusions of this research and future directions are described.
3. Results
In this study, an automated heart disease classification system using the PCG signal is proposed. Raw PCG signal was first preprocessed through EMD, followed by feature extraction through the fusion of MFCC and 1D-LTP features. 1D-LTPs extract the most discriminative information embedded in the PCG signal. Distribution of 1D-LTP features of different classes (normal/ASD/VSD) can be visualized from scatter plots shown (
Figure 9). It can be observed that the intra-class difference between features is minimal, while the inter-class difference is maximal. This shows that the extracted features contain generous decisive information about different classes of PCG signals.
The performance of the proposed method was evaluated using standard statistical indices of accuracy, sensitivity (sen), and specificity (spec), which were calculated from the following four parameters
True positive (TP): abnormal PCG signal correctly detected as abnormal.
False negative (FN): PCG signal of an abnormal subject detected as normal.
True negative (TN): normal PCG signal correctly detected as normal.
False positive (FP): PCG signal of a normal subject detected as abnormal.
In this study, the experiments were performed for two different problems.
Detection experiment (normal vs. abnormal): All feature vectors belonging to abnormal subjects (ASD, VSD) were labeled as abnormal.
Multiclass evaluation (normal vs. ASD vs. VSD): Feature data were labeled according to the disease type in the experiment.
Training and testing of classifiers were pursued through a 10-fold cross-validation method with each subset of features; i.e., MFFC, 1D-LTPs, and fusion of MFCC+1D-LTP. All simulations were performed in MATLAB 2018a on the core i5 computer. All results presented in this paper were averaged over 100 experiments.
3.1. Detection Experiment
The experiments for the detection of normal and abnormal subjects were performed on the self-collected dataset using a low-cost data acquisition setup. In detection experiments, the dataset was split into two classes; namely, normal and abnormal. All features vectors belonging to ASD and VSD patients were labeled as abnormal. An SVM classifier with different kernel functions, such as SVM-linear (SVM-L), SVM-quadratic (SVM-Q), SVM-cubic (SVM-C), and SVM-Gaussian (SVM-G), was employed to perform classification. The results of these experiments in terms of accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and error rate are illustrated in
Table 4. Results of applying individual feature sets (MFCC and 1D-LTP) on PCG signal data are also presented (
Table 4). The highest results using only MFCC features were achieved through SVM-C (94.05%); 1D-LTP-only feature extraction achieved the highest accuracy of 94.05% with the SVM-Q classifier. The best results of 95.8% accuracy with SVM-C classifiers were acquired upon feature fusion of MFCCs and 1D-LTPs.
Table 5 illustrates the confusion matrix showing individual class accuracy with SVM-C and a combination of MFCC and 1D-LTP features. It was evident from experimentation that the fusion of MFCC and 1D-LTP features provide a significant improvement in classification performance.
3.2. Multiclass Evaluation (Normal vs. ASD vs. VSD)
Multiclass experiments were performed to precisely identify the type of heart disorder. Features were labeled according to the disorder type; i.e., ASD, VSD, or normal. A multiclass SVM with different kernels was trained and tested using 10-fold cross-validation. The results of applying different multiclass SVM classifiers on individual feature sets (MFCC, 1D-LTP) and the fusions of both are illustrated in
Table 6. The obtained results revealed that the SVM-C classifier achieved a peak accuracy of 88.69% with only MFCC features, while the same classifier provided 94.64% accuracy with 1D-LTP features. Performance results were further improved by the fusion of MFCC and 1D-LTP features with the SVM-C classifier; i.e., 95.24% accuracy. In
Table 7, class-wise information of accuracy for ASD, VSD, and normal classes in the form of a confusion matrix with the SVM-C classifier are shown. The proposed feature fusion methodology effectively extracted the characteristic information from multiclass PCG signals.
3.3. Statistical Significance
The primary objective behind performing this statistical analysis was to achieve a certain level of confidence in the proposed scheme. Analysis of variance (ANOVA) [
67] was utilized to testify whether the results were statistically significant or not—simply by comparing the means of multiple distributions.
In this work, a proposed scenario (MFCC + 1D-LTP) was considered for two different classifiers (SVM-C, SVM-Q)—selected based on the improved performance compared to the rest. In using ANOVA, a series of tests were performed for the assumptions of normality and homogeneity of variance. A Shapiro–Wilk test [
68] was performed for the former, and the Bartletts test [
68] for the latter one—with the significance level
selected to be 0.01. The means of our approach were
,
, calculated from the overall accuracy of both classifiers. The null hypothesis
, given that
, while the alternative hypothesis
given that
. The
p-value was computed and the null hypothesis was tested,
; if it was rejected,
, then the Bonferroni posthoc test was applied.
For the proposed method (MFCC + 1D-LTP), and with selected classifiers (SVM-C and SVM-Q), the Shapiro–Wilk test generated
p-value,
, and
. By following the Bartletts test, the associated chi-squared probabilities were:
and
. The
p-values of two different classifiers are significantly greater than
. Therefore, from the test results (normality and equality of variances), we failed to repudiate the null hypothesis
, and we are confident in claiming that the test data were normally distributed, and the variances were also homogeneous. The ANOVA test, including five different parameters (degrees of freedom (dfs), a sum of squared deviation (SS), mean squared error (MSE), F-statistics, and
p-value) is shown in
Table 8. The performance ranges of two selected classifiers based on the proposed method are shown in
Figure 10.
The results were validated based on the Bonferroni post hoc test,
Figure 11, which is the most common approach to be applied whenever there exists a chance of a significant difference between the means of multiple distributions. It was certified that the proposed method performed much better than conventional methods.
4. Discussion
The proposed method of feature fusion with EMD-based signal denoising effectively extracted embedded information from PCG signals using the self-collected dataset of ASD and VSD cardiac disorders. The MFCC extracted frequency-domain features, while 1D-LTP features extracted temporal and texture information from the signal. Feature fusion of these two different types provided a powerful signal representation for different classes (normal, ASD, VSD) with a high degree of accuracy. Moreover, the proposed method classified normal and abnormal PCG data through SVM-C classifier with 95.83% accuracy, while 95.34% average accuracy was achieved on multiclass PCG data with the same classifier.
The numbers of classes, feature extraction techniques and classification methods of the proposed method were compared with the previously developed platforms (
Table 1), which showed that several existing works [
9,
10,
11,
13,
15,
17] utilized the Physionet Challenge 2016 dataset [
69] comprised of only two classes (healthy and unhealthy) while others used self-collected PCG signal data. MFCCs were widely employed by several studies [
9,
11,
17,
35], and acted as baseline features of choice. The SVM classifier is also widely adopted by existing works [
10,
11,
12,
13].
DWT and statistical features were used with a multilayer perceptron to achieve 96.6% accuracy on normal and ASD classes of PCG data [
36]. In another work [
38], a comparison of feature reduction methods was demonstrated. Experimental results are shown between normal and three different classes of heart diseases; i.e., ASD, VSD, and aortic stenosis. Feature reduction methods (DDE, FSDA, CBEF, EFUAP) were applied with K-nearest neighbor (KNN) classifier and 84.3% accuracy was achieved.
In contrast to the existing work, our research targeted the classification of multiple heart disorders (ASD, VSD) with the feature fusion approach of MFCC and new temporal feature descriptor 1D-LTP. The proposed method outperforms the existing approaches, as is evident from the presented results. To confirm the validity and robustness of our proposed method, confidence intervals against binary and multiclass experiments are also provided for the two best classifiers; i.e., SVM-C and SVM-Q.
Figure 12a illustrates the confidence interval showing maximum, minimum, and average classification results of individual MFCC and 1D-LTP features and the feature fusion approach for binary experiments.
Figure 12b presents a confidence interval of minimum, maximum, and average classification accuracy for multiclass experiments. From this comprehensive statistical analysis, it is quite straightforward to choose SVM-C as a standard classifier for this application.