1. Introduction
The number of smart devices connected to the Internet, or the Internet of Things (IoT), is growing rapidly [
1,
2]. A substantial fraction of these devices are in the medical field, a trend known as the Internet of Medical Things (IoMT) [
3,
4]. The use of the IoMT has improved healthcare operations, remote services, and patient monitoring [
5]. However, there are serious security and privacy issues as the IoT-enabled medical devices are vulnerable to a wide range of cyber-attacks [
6,
7,
8,
9]. If these devices are accidentally exposed, they could be exploited by adversaries using advanced persistent threats (APTs) and known weaknesses, potentially disrupting healthcare operations and endangering human lives [
7,
10]. Therefore, security should be a top priority when using the IoMT for remote health monitoring [
2,
9].
Detecting and mitigating attacks in the IoMT can be accomplished with various techniques and methods [
1,
2,
11]. These include log monitoring, vulnerability management, threat intelligence, end device monitoring, and intrusion detection and prevention systems [
12,
13,
14]. Intrusion detection systems, which rely on traffic anomalies, signature-based rules, or security policies, are frequently used to identify attacks in IoT-enabled networks [
13,
15]. However, traditional detection techniques often fall short as attackers continually refine their strategies and employ advanced hacking techniques [
16,
17,
18]. For example, security policies can be circumvented if an attacker conducts network reconnaissance or reverse engineers network devices like routers and firewalls [
19]. To enhance attack detection, researchers are exploring machine learning (ML) and deep learning (DL) solutions [
16,
20]. Thanks to advances in computing and processing capabilities, ML and DL techniques can be used on a large scale to predict attack events with greater accuracy.
While intelligent intrusion detection systems (IDSs) that use ML and DL techniques have been proposed [
21,
22,
23], they may not be suitable for IoMT scenarios. These systems, designed for conventional networks, are not ideal for assessing attack detection in the IoMT because health IoT sensors connected to the Internet generate different types of data [
24]. Additionally, in smart health applications, most existing methods only analyze network traffic to identify IoMT attacks, ignoring patient biometric information [
24]. However, such information is crucial in the IoMT context as it offers insights into a patient’s condition and can be linked to network disruptions caused by attacks that affect the confidentiality, availability, and integrity of healthcare data [
25,
26]. Therefore, for more effective attack prediction, both network traffic and patient biometrics data should be considered together. Analyzing the relationship between these two disparate data types during an attack can provide a more comprehensive understanding of the situation.
The existing literature presents a variety of deep learning-based intrusion detection systems (IDSs) for the Internet of Medical Things (IoMT) [
25,
27,
28,
29]. A common approach involves passing information of network flows and patient biometrics through several hidden layers of deep learning [
10,
16]. This approach employs a global attention layer for optimal feature extraction from the spatial and temporal characteristics of deep learning, and incorporates a cost-sensitive learning approach to address data imbalance [
24]. However, these studies do not discuss the challenges related to the static number of epochs and batch sizes often used in deep learning models. Another study introduces a swarm-neural network-based model to detect intruders in IoMT systems [
30]. This model acknowledges the security and privacy concerns that arise from transferring patient data to the cloud for processing, due to the limited storage and computation capacity of IoMT devices. However, the swarm-neural network model’s performance metrics are not clearly specified, and the concern of statically setting the number of epochs and batch sizes remains unaddressed.
In the realm of explainable AI (XAI), a novel model, XSRU-IoMT [
25], was proposed to detect sophisticated attack vectors in IoMT networks. This model leverages bidirectional simple recurrent units (SRUs) with skip connections to overcome the vanishing gradient problem and expedite the training process. While it improves the trust level by providing explanations for prediction decisions, the study does not offer insights on how the static number of epochs and batch sizes might influence the model’s efficiency and accuracy.
Another research proposes a cyber-attack detection method employing ensemble learning and a fog–cloud architecture [
31]. This system uses a set of LSTM networks for initial learning and a decision tree for classifying attacks and normal events. While this paper offers an innovative framework for deploying IoMT-based approaches as cloud and fog services, it does not delve into the implications of setting a fixed number of epochs and batch sizes in the learning process. While various deep learning models have been proposed for IoMT intrusion detection, few discuss the impact of static epochs and batch sizes in training these models. Future studies might aim to dynamically adjust these parameters based on the data characteristics and model performance to potentially enhance the efficacy of the IDS in the IoMT.
The optimization of epoch numbers in deep learning models is contingent upon unique data characteristics, model architecture, and the specific tasks required. A popular technique for preventing model overfitting and enhancing the accuracy of new data is the use of ‘early stopping’. This process ceases training when there is a noted decline in model performance. Several research studies have applied early stopping methodologies to improve the precision of DL models. However, implementing early stopping methods in deep learning models poses challenges. A significant issue is the automated determination of the optimal stopping point, which can vary greatly depending on the data, model architecture, and task at hand. Striking a balance between mitigating overfitting and ensuring the model’s ability to generalize to new data is crucial. Moreover, the definition of performance degradation can vary depending on the dataset and task, complicating the application of early stopping methods across different contexts. Hence, there is an ongoing need for a more comprehensive understanding and application of early stopping in deep learning models. To this end, this paper is devoted to investigating the application of fuzzy logic for estimating the optimal value of the patience parameter used to trigger early stopping during the training phase of deep learning models.
The proposed fuzzy-based self-tuning approach for intrusion detection in the Internet of Medical Things (IoMT) significantly advances the state of the art by introducing a dynamic early stopping mechanism tailored to the unique characteristics of IoMT data streams. This mechanism, underpinned by fuzzy logic, adaptively determines the optimal stopping point during training, a feature not commonly present in existing models, which often rely on static parameters. Furthermore, our self-tuning LSTM algorithm is specifically designed to address the challenges inherent in IoMT data, such as high dimensionality and the need for real-time processing, by autonomously adjusting the number of training epochs. This self-tuning capability is a substantial improvement from traditional methods that require manual epoch tuning. Additionally, our model’s integration of both network traffic and patient biometric data for intrusion detection is particularly innovative, as it leverages the correlation between these data types to provide a more nuanced detection capability in the IoMT context. An extensive experimental evaluation underscores the effectiveness of our approach, showcasing its competitive performance and improved adaptability in real-time threat detection scenarios. We believe these elements collectively underscore the novelty and improved efficacy of our proposed solution in the realm of IoMT security.
In particular, this study focuses on incorporating a dynamic early stopping approach into the Long Short-Term Memory (LSTM) classifier for the IDS in the IoMT. Recognizing this critical challenge, our paper is driven by the following specific objectives:
To develop an intrusion detection system (IDS) tailored for the IoMT ecosystem: We aim to design a system that not only detects common cyber-threats but is also capable of identifying IoMT-specific attacks that could disrupt healthcare services and compromise patient data.
To implement a fuzzy-based self-tuning mechanism within an LSTM network: Our objective is to enhance the traditional LSTM approach by incorporating a fuzzy logic component that dynamically adjusts the number of training epochs, thereby optimizing the model’s performance and responsiveness to the evolving IoMT threat landscape.
To evaluate the effectiveness of early stopping techniques in deep learning models for the IoMT: We seek to investigate how fuzzy logic can refine early stopping methods to prevent overfitting, ensure timely model convergence, and maintain a high detection accuracy.
To assess the impact of integrating patient biometric data with network traffic analysis for intrusion detection: Our research questions whether the inclusion of diverse data types can improve the IDS’s ability to detect sophisticated attacks within the IoMT framework.
By setting these objectives, we provide a clear roadmap for our research, guiding readers through the development and validation of an IDS that is both effective and specifically optimized for the IoMT context. The purpose of our work is to contribute to the body of knowledge in IoMT security, offering a novel approach that addresses the unique challenges posed by this emerging field.
The rest of the paper is organized as follows. In
Section 2, related works are explored.
Section 3 describes the methodology and proposed techniques.
Section 4 presents and discusses the experimental results in comparison with related models. The paper ends with a concluding section that revisits the work performed and provides suggestions for future research.
3. The Methodology
3.1. A Fuzzy-Based Patience Parameter Estimation for Early Stopping of LSTM Model Training
In contrast to traditional sequential models, our model dynamically adapts the number of training epochs and batch size per epoch during training, based on how much each batch contributes to the model’s accuracy. A key innovation is our use of fuzzy logic to optimize the patience parameter, which controls when the early stopping mechanism is activated during the model’s training.
In our approach, the early stopping mechanism commences training with arbitrary parameters and suspends the process when there are no significant improvements at both levels. This mechanism monitors one or more performance indicators during the training phase of the model, which can prompt an early termination of the training process. In our study, we monitor the loss on the validation set, and training is discontinued when no further reductions in the validation loss are detected.
To avoid halting the training process prematurely, we have incorporated a dynamic system for establishing a patience threshold. Instead of using a static value or a simple running average of the loss differences, we have developed a fuzzy logic technique that determines the optimal patience level based on multiple inputs: the model’s accuracy, validation loss, and rate of improvement.
This fuzzy logic technique takes these inputs and, through a series of fuzzy rules and defuzzification, outputs an optimal patience level. This patience level is then used to decide when to stop the training process, providing a more dynamic and adaptive approach to early stopping. The system updates the patience level after each epoch, thereby providing a nuanced, data-driven way to determine when to cease training. This novel method therefore avoids arbitrary termination and helps to prevent both overfitting and underfitting of the model.
Let F represent the fuzzy-based technique, which receives three inputs: accuracy (A), validation loss (L), and the rate of improvement (R). These inputs are fuzzified, and their corresponding membership value (
) is determined via the following membership functions.
The fuzzy-based patience estimation is denoted as
, which is a value de-fuzzified based on two sets with corresponding membership functions:
This technique is controlled by two rules, R1 and R2, as follows.
The outputs from these rules
is calculated using a min–max reference method, as follows.
So, the optimal patience value can be calculated according to the following equation:
where O is the aggregated output membership function, which is obtained by taking the maximum of
and
at each
.
An LSTM network is composed of an input layer, several hidden layers, and an output layer. A key feature of this network is the LSTM memory cells embedded into the hidden layers. Each of these LSTM memory cells possesses three distinct gates, which collectively manage its cell state: the forget gate, the input gate, and the output gate. These gates have unique roles: (1) the forget gate determines what information should be discarded, (2) the input gate decides what information is to be incorporated, and (3) the output gate establishes what information should be emitted from the cell state.
The overall architecture of a memory cell is depicted in
Figure 1. This diagram presents a structured view of a Long Short-Term Memory (LSTM) network integrated with a fuzzy logic controller for early stopping. The network is organized into distinct layers, each represented by color-coded blocks. The input layer, highlighted in light blue, consists of neurons that receive the initial data. This data flows into the LSTM layer, depicted in light green, where LSTM cells process the temporal aspects of the input layer. The processed information then moves to the output layer, shown in light yellow, consisting of neurons that generate the preliminary output of the network. Crucially, this output is fed into the fuzzy logic controller, colored in light pink, which comprises three parts: fuzzy input, fuzzy logic, and fuzzy output. The fuzzy logic controller evaluates the output data and applies fuzzy logic rules to determine whether the training should continue or stop early, thereby preventing overfitting and enhancing the model’s efficiency. This decision is fed back to the LSTM layer, as indicated by the dotted lines, influencing subsequent processing cycles. This integration of fuzzy logic into the LSTM network aims to optimize the training process, ensuring timely convergence and maintaining high detection accuracy, particularly in dynamic environments like the Internet of Medical Things (IoMT).
During the initial step, the forget gate’s activation values decide what information from the prior cell state needs to be discarded. Equation (9) shows such calculation:
where
and
are weight matrices, and
is the result of adding the current input
at time t, the output
from the hidden cell state at the previous time step t − 1, and the bias vector
. The bias vector provides the model with a greater adaptability in terms of fitting the data. The sigmoid function is used to scale the value within a range between 0 and 1, where 0 and 1 suggest that the results are interpreted as completely forgotten and completely remembered, respectively.
The next phase involves deciding the extent of updating the current time series information in the new cell state. This involves a two-step process. Firstly, candidate values (
) that may be incorporated into the new cell state (
) are computed using the hyperbolic tangent (
) function. Then, the activation values (
) of the input gate are calculated. These values dictate which candidate values (
) are to be included in the cell state (
). The calculation is as follows:
Then, new cell states (
) are generated using a combination of the prior cell state (
) and the current candidate values (
). The calculation for this is as follows:
Here, the product of the previous cell state () and establish the amount of past information that needs to be discarded, whereas the product of the candidate values () and defines the volume of current information that needs to be retained. By adding the preceding results, the new cell state () is obtained.
The output (
) is regulated by the activation values (
). The calculation for this is as follows:
The LSTM network needs sequences of input features for its training process. The network processes the sequential input at each instance (), as expressed in the equations. Throughout the training, the weights () and bias terms () are optimized with the goal of minimizing the loss of the specified objective function.
3.2. The Improved Fuzzy-Based Self-Tuning LSTM Model for the IDS in the IoMT
The proposed fuzzy-based self-tuning LSTM model for the IDS in the IoMT (FST-LSTM), as illustrated in
Figure 2, consists of two main phases: data pre-processing and model training. During data pre
-processing, network flow and patient biometric data from medical sensors are transformed into numerical forms suitable for modeling. Several steps were taken during the pre
-processing phase to maintain data integrity. These included normalizing the data to retain its original range, refraining from reordering the dataset to keep the time sequence intact, and avoiding any resampling operations to maintain consistent data collection intervals. Such cautious pre-processing preserves critical data characteristics, ensuring accurate subsequent analysis and results. Additionally, noise, which can come from measurement errors, missing values, or outliers, can lead to poor model performance and unreliable outputs. To tackle this issue, a filter based on the statistical mean and standard deviation was employed to identify and eliminate outliers in each attribute of the dataset.
Then, normalization was conducted to scale all attribute values between 0 and 1, which mitigates the risk of machine learning algorithms favoring attributes with larger ranges during their training. Normalization also helps minimize the effect of large values, improves the algorithm convergence, reduces overfitting, and prevents model bias toward certain features. Hence, it facilitates a more accurate depiction of the relationships among the features in the dataset. After that, we used the features selection technique proposed in [
54] to select a compact set of relevant, non-redundant features, which reduces the model’s complexity.
The next step was developing the LSTM-based model with an improved early stopping mechanism that prevents the model from overfitting and underfitting. This stage incorporates a deep learning model designed to detect attacks in IoMT network traffic. The features are fed into Long Short-Term Memory (LSTM) layers, which collaboratively learn their spatial and temporal patterns. Rather than using only the final hidden states, the LSTM model takes all hidden states into consideration and feeds them into a global attention layer similar to soft and additive attention, as described in [
24]. This layer employs a Relu activation function on all of the hidden state features. These features then pass into a fully connected layer with 50 neurons. Dropout and batch normalization methods are used in the hidden layers to accelerate training. The model ultimately classifies data inputs as either normal or attack. Due to the significant imbalance in IoMT network traffic, this work adopts a cost-sensitive learning approach, assigning greater weights to the attack class and lesser weights to the normal class during model training. Initial values for the cost matrix are randomly selected following a Gaussian distribution and are fine-tuned during the training phase.
3.3. Description of the Dataset
In this study, we utilized the WUSTL-EHMS-2020 dataset, which combines network flow parameters and patient biometric data. This dataset originated from an Enhanced Healthcare Monitoring System (EHMS) testbed that operates in real-time. The testbed consists of four main elements: medical monitoring sensors, a data-transmitting gateway, a network infrastructure, and a control unit with visualization capabilities. The data are collected from sensors attached to patients, transmitted through the gateway, and then sent to a dedicated server for visualization using routing and switching mechanisms. The EHMS testbed was specifically designed to gather network flow metrics and biometric data from patients. Its system includes six crucial components: a multi-sensor board, a gateway or central control hub, a data server, an intrusion detection system (IDS), a simulated attacker, and a dedicated network.
The PM4100 Six Pe Multi-Sensor Board, manufactured by Medical Expo, is equipped with four sensors that monitor important patient vitals such as electrocardiograms (ECGs), blood oxygen saturation (SpO2), body temperature, and blood pressure. The collected data is transmitted via a USB interface to a laptop running Windows, which serves as the gateway. The gateway presents the data visually through a graphical user interface (GUI), while also transmitting it to a server for further processing. The server, operating on an Ubuntu system, collects and analyzes the data, and assists in making informed medical decisions. The network infrastructure includes an Ethernet switch that connects the server, IDS, and a computer simulating attacks, with a router responsible for assigning dynamic IP addresses. The IDS relies on Argus network flow monitoring software to gather network flow metrics and biometric data, enabling important decisions about traffic packets. The simulated attacker, using Kali Linux, creates potential threats such as data spoofing or altering patient data during transmission to simulate hazards that may exist in healthcare monitoring systems.
Prior to utilization, the dataset underwent several pre-processing steps to ensure data quality and relevance. These steps included data cleaning to remove any inconsistencies or outliers, normalization to standardize the range of continuous initial variables, and feature selection to identify the most relevant attributes for intrusion detection. This pre-processing was critical in refining the dataset for optimal model training and performance.
3.4. Experimental Environment
The construction and performance assessment of the proposed model was executed using various software and tools, such as Python, Skfeature, TensorFlow, Keras, Scikit Learn, and NumPy. In addition, the organization of data samples, application of algorithms, and results interpretation were performed on a device equipped with an Intel(R) Core (TM) i7-4790 CPU @ 3.60 GHZ and 16 GB RAM.
In assessing the effectiveness of our model, we selected a set of performance metrics that are widely recognized in the field of intrusion detection. Accuracy (ACC) was chosen as the primary indicator of overall model performance, providing a straightforward measure of the model’s ability to correctly classify instances. However, to gain a more nuanced understanding of the model’s predictive power, we also included the false positive rate (FPR), detection rate (DR), and F1 score (F1). These metrics were selected because they offer a balanced view of the model’s performance, accounting for the costs of misclassification. The FPR is particularly important in the IoMT context, where false alarms can be costly and disruptive. The DR (also known as recall) is critical for ensuring that actual intrusions are reliably detected, and the F1 score provides a harmonic mean of precision and recall, which is useful when seeking a balance between the model’s sensitivity and specificity.
where TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively.
4. Results and Discussion
This section discusses the outcomes of the proposed fuzzy-based self-tuning IDS (FST-LSTM) model and provides comparisons with related studies. Experimental evaluations were conducted using various Python-based packages, including SkLearn, Pandas, NumPy, and SkFeature. To evaluate the performance of our technique, multiple performance metrics were used, namely accuracy (ACC), false positive rate (FPR), detection rate (DR), and F1 score (F1). The training process for intrusion detection in the IoMT involved several steps. Initially, data pre-processing was performed, including normalization, handling missing values, and transforming the data for model training. Then, a set of relevant and non-redundant features were selected and projected onto the dataset. The dataset was then divided into training and validation sets. The training set was utilized to train the LSTM, while the validation set was used to assess its performance. The model’s parameters, such as the number of layers, neurons, activation function, and optimizer, were defined. The model architecture was subsequently trained using the training set, and adjustments were made based on prediction errors calculated via the loss function. After training, the model’s performance was evaluated using the validation set, involving metrics such as accuracy, precision, recall, and other relevant measures.
Table 1 shows the performance metrics for different numbers of features used in the training of the proposed LSTM-based IDS model.
As the number of features increases from 5 to 45, the accuracy (ACC) remains consistently high, ranging from 0.944 to 0.967. The false positive rate (FPR) decreases gradually, indicating a reduction in the number of false alarms, with the lowest value observed at 0.104 for 25 features. The detection rate (DR) also shows a gradual improvement, reaching a peak of 0.943 for 25 features. The F1 score, which considers both precision and recall, increases as the number of features increases, with the highest value of 0.966 achieved for 25 features. Overall, the results demonstrate that increasing the number of features has a positive impact on the performance of the LRGU-MIFS technique, leading to higher accuracy and improved detection rates while maintaining a low false positive rate.
The results in
Table 1 provide evidence of the model’s sustained high performance, demonstrating the effectiveness of the fuzzy logic in determining the optimal patience value for an improved self-tuning capability. This can be attributed to the integration of fuzzy logic within the self-tuning mechanism, which accurately estimates the value of the patience parameter during the training phase. By dynamically adjusting the number of epochs and preventing overfitting, our model successfully avoids both underfitting due to insufficient epochs and overfitting caused by excessive training. This robust approach contributes to the reliable detection rate of our fuzzy-based self-tuning LSTM IDS, highlighting its ability to effectively identify a significant proportion of actual intrusions within the IoMT environment. Through the effective capture and classification of anomalous patterns, the proposed IDS has the ability to ensure the security and integrity of healthcare systems, protecting them from potential threats.
The results presented in
Table 1 also demonstrate interesting performance trends, particularly when the number of features reaches 25. At this point, there is a noticeable improvement in the model’s performance across multiple evaluation metrics. The accuracy (ACC) increases to 0.967, indicating a high level of correct classifications. Additionally, the false positive rate (FPR) significantly decreases to 0.104, indicating a reduced number of false alarms. The detection rate (DR) remains consistently high at 0.943, indicating the model’s ability to accurately identify intrusions. The F1 score (F1) also reaches a high value of 0.966, reflecting the model’s balanced precision and recall. These performance enhancements at 25 features suggest that the proposed model was able to perceive the attack patterns even with a fewer number of epochs and fewer features used as inputs. Such a level of performance suggests that the early stopping mechanism maintained a good trade-off between performance and complexity, allowing the model to achieve high accuracy without sacrificing efficiency.
The performance of the proposed FST-LSTM model is compared with three existing models, namely the DL-IDS [
24], RNN-IDS [
55], XSRU-IoMT [
25], GDRL [
5], and ODLN [
38], across multiple evaluation metrics, as shown in
Figure 2,
Figure 3,
Figure 4 and
Figure 5. The rationale for choosing these studies to compare ours with is that they work on IoMT data and apply deep learning algorithms for developing the IDS.
Figure 2 presents the accuracy scores, indicating the proportion of correct classifications. The FST-LSTM model consistently outperforms the other LSTM models across different numbers of features, achieving the highest accuracy scores. The proposed model shows a high accuracy, peaking at 25 features with a score of 0.967. Beyond 30 features, a slight decline in accuracy is observed for most models, including the FST-LSTM, suggesting a potential threshold for optimal feature utilization.
The comparison results in the table reveal the FST-LSTM model as a robust performer in the IoMT IDS landscape, consistently outperforming other models, especially in scenarios with a higher number of features. Its peak performance at 25 features suggests an optimal balance between feature count and model efficiency. The decline in accuracy beyond this point for the FST-LSTM and other models implies a potential overfitting or diminishing of returns with too many features. Comparatively, models like the XSRU-IoMT show close competition, especially at higher feature counts, while the GDRL and ODLN lag slightly behind across most feature ranges. These results underscore the importance of feature selection in IDS model performance, highlighting that an increased number of features does not always correlate with enhanced detection capabilities, particularly beyond a certain threshold.
Similar trends can be observed in
Figure 5, which presents the F1 scores, reflecting the balance between precision and recall. The FST-LSTM model consistently achieves the highest F1 scores, indicating its superior performance in capturing both true positives and true negatives. The proposed model demonstrates a consistent increase in F1 score as the number of features grows from 5 to 25, peaking at 0.966 for 25 features. Beyond this point, a gradual decrease in F1 score is observed for the FST-LSTM and other models, indicating a potential limit to the effectiveness of increasing feature counts. Comparison reveals that the FST-LSTM model is a highly effective solution concerning the IoMT IDS, maintaining superior performance across a wide range of feature counts. Its peak F1 score at 25 features suggests an optimal point for feature utilization, balancing precision and recall effectively. The gradual decline in F1 scores beyond 25 features for all models, including the FST-LSTM, suggests a potential overfitting issue or inefficiency in handling an excessive number of features. In comparison, models like the DL-IDS and RNN-IDS show competitive performances, especially at higher feature counts, closely following the FST-LSTM model. The XSRU-IoMT, GDRL, and ODLN models exhibit varying degrees of effectiveness, with some performing better at lower feature counts and others at higher. These results highlight the importance of an appropriate feature count in maximizing the F1 measure, indicating that an excessive number of features might lead to a decrease in the balance between precision and recall.
Figure 3 shows the false positive rates (FPRs), wherein the FST-LSTM model consistently exhibits lower FPR values compared to the other models, indicating its ability to reduce false alarms. The proposed model demonstrates an overall decreasing trend in false positive rates as the number of features increases, with a notable dip at 25 features (0.104). Beyond 25 features, there is a gradual increase in false positive rates for the FST-LSTM and other models, suggesting a limit to the effectiveness of feature count in reducing false alarms. Analysis of the false positive rate indicates that the FST-LSTM model is effective in minimizing incorrect threat detections, especially in scenarios with a moderate number of features. Its lowest false positive rate at 25 features suggests an optimal balance in feature count, where the model efficiently distinguishes between normal and malicious activities. The increase in false positive rates beyond this point for the FST-LSTM and other models implies a potential overfitting or reduced efficiency with too many features. Compared to other models, the FST-LSTM generally maintains a lower false positive rate, indicating its superior capability to avoid false alarms. Other models, such as the DL-IDS and RNN-IDS, show competitive performances but slightly higher false positive rates at various feature counts. The XSRU-IoMT, GDRL, and ODLN models exhibit higher false positive rates, particularly at higher feature counts, indicating a potential difficulty to maintain accuracy without compromising on false detections. These results highlight the importance of a balanced feature selection in IDS models for the IoMT, where an excessive number of features might lead to increased false alarms, undermining the system’s reliability and user trust.
Lastly,
Figure 4 presents the detection rates, measuring the ability to accurately identify intrusions. Again, the FST-LSTM model demonstrates higher detection rates across different numbers of features, showcasing its effectiveness in identifying intrusions in the IoMT environment. The proposed model exhibits a steady increase in detection rate as the number of features increases from 5 to 25, reaching a peak at 0.943 for 25 features. Beyond 30 features, the detection rate for the FST-LSTM and other models shows a slight fluctuation, suggesting a plateau in performance improvement with an increase in feature count. Comparison reveals that the FST-LSTM model is a consistently strong performer in detecting intrusions within the IoMT environment. Its peak performance at 25 features indicates an optimal balance in utilizing a sufficient number of features to effectively identify threats. The slight fluctuations in detection rate beyond 30 features across all models, including the FST-LSTM, hint at a potential limit to the benefits of increasing the feature counts, where additional features may not significantly enhance detection capabilities. Compared to other models, the FST-LSTM generally maintains a higher detection rate, especially in the mid-range of feature counts. Models like the DL-IDS and RNN-IDS show competitive performances, closely following the FST-LSTM model, while the XSRU-IoMT, GDRL, and ODLN models exhibit varying effectiveness at different feature counts. These results underscore the importance of an optimal feature selection strategy in IDS models for the IoMT, where too many features might not necessarily lead to improved detection rates and could potentially introduce complexity without significant benefits.
These results highlight the superior performance of the proposed FST-LSTM model compared to the existing LSTM models, suggesting its efficacy in intrusion detection tasks.
To mitigate overfitting and underfitting in our LSTM model and better its performance, we have implemented a multifaceted approach that leverages both algorithmic and architectural strategies. Algorithmically, our model employs a fuzzy-based dynamic adjustment of the patience parameter in the early stopping mechanism. This approach is preferred over static early stopping because it allows the model to adaptively determine the optimal point at which to halt training based on the actual learning progress, rather than a predetermined, fixed number of epochs. The fuzzy logic system evaluates the model’s performance, considering accuracy, validation loss, and the rate of improvement, to dynamically adjust the patience parameter. This ensures that the model continues to learn as long as significant improvements are made, thereby avoiding premature stopping (which could lead to underfitting) and excessive training (which could lead to overfitting).
Architecturally, we introduced dropout layers and batch normalization within the hidden layers of our LSTM network. Dropout layers randomly deactivate a subset of neurons during the training process, which prevents the network from becoming overly dependent on any specific neuron and thus reduces overfitting. Batch normalization standardizes the inputs to a layer for each mini-batch, stabilizing the learning process and accelerating convergence by reducing internal covariate shift. These combined strategies form a robust defense against overfitting and underfitting, ensuring that our model achieves a balance between bias and variance, ultimately leading to better generalization of unseen data. Our choice of these specific techniques is driven by their proven effectiveness in similar contexts, as documented in the literature, and their suitability for the complex and dynamic nature of IoMT network traffic and attack patterns.