RDSC: Range-Based Device Spatial Clustering for IoT Networks
Abstract
:1. Introduction
- Security is one of the major concerns, as the large number of devices connected to the network makes it more vulnerable to attacks by increasing the potential entry points for cybercriminals.
- Data management also represents a significant challenge. The massive amounts of data collected by IoT devices must be stored in data centers for processing and analysis, which can be costly in terms of storage, financial resources, and human administration. Moreover, these sensitive data must be anonymized and, in some cases, encrypted to preserve user privacy and confidentiality.
- Another key challenge is the heterogeneity of IoT devices. These devices are produced by different manufacturers, with varying sensing capabilities, power consumption, and storage capacities, which complicates their management.
- Finally, the management of IoT networks also brings its share of complexities, particularly when devices use distinct communication protocols, such as Wi-Fi and Bluetooth, making their interconnectivity more difficult.
2. Motivating Scenario
2.1. Chiberta Forest Setup
2.2. Device Heterogeneity
2.3. Challenges
- -
- Energy-efficient hardware: using devices with low energy consumption while enabling sleep/wake methodologies will expand the network’s life expectancy. In addition, using energy harvesting techniques with solar panels and other energy sources will keep devices alive for a longer duration.
- -
- Data aggregation: aggregating the data (crowd wisdom) and forwarding them at once will reduce the number of devices involved and extend the network’s lifetime. Furthermore, data aggregation offers functionalities that are unavailable when each device operates independently, such as handling missing data, detecting anomalies, and ensuring data quality.
- -
- Communication Protocols: efficient transmission protocol (e.g., MQTT) will reduce the number of devices involved during the packet transmission, hence enhancing the network lifetime.
- -
- Load balancing: load balancing involves distributing network load across multiple nodes. By spreading tasks among different nodes, heavy workloads can be divided, reducing resource consumption and, consequently, energy consumption.
- Challenge 1: How to cluster devices while taking into consideration their limited storage capacity?
- Challenge 2: How to manipulate device coverage range while clustering? How to manipulate coverage range gaps while clustering?
- Challenge 3: How to take into account device power while clustering to optimize network lifecycle?
- Challenge 4: How can device connectivity be considered while clustering to optimize packet forwarding between clusters?
3. Related Works
3.1. Clustering Background
- Centroid-based: Objects are assigned to the nearest cluster head based on the distance between the current point and other cluster heads (CHs). Some examples of centroid-based algorithms are the k-means and k-medoids. These algorithms are used in many use cases related to energy management (electric vehicles [3] and smart grids [4]), network security (false data injection [5]), and the identification of unstable cluster heads [6]).
- Distribution-based: These clustering algorithms rely on the probabilistic distribution of the objects. The clustering model calculates the probability of an object being assigned to a specific cluster. Gaussian mixture model (GMM) clustering is an example of a distribution-based algorithm. This technique can be used to model electricity consumption patterns [7,8] and perform system reliability analysis [9]. Another popular distribution-based technique is Bayesian clustering. Bayesian clustering can be used for model parameter estimation [10,11] and energy consumption pattern detection [12].
- Density-based: The goal of such algorithms is to group objects with high density. These algorithms are suitable for complex data with different shapes and structures. DBSCAN is a popular example of density-based algorithms. This type of methodology is used in anomaly detection and in the discovery of power consumption patterns [13].
3.2. Device Clustering
3.3. Agglomerative Hierarchical Clustering
3.4. Comparison Table
4. Preliminaries and Assumptions
4.1. Device, Sensor, and Cluster Head
- is the device identifier;
- l is the device location stamp (see Remark 1);
- c is the device storage capacity (in bytes);
- p is the device current power level (in Wh); and
- is the set of the device sensors. Each sensor is defined as , where
- -
- o is an observation (i.e., sensed data such as temperature);
- -
- is its coverage zone (see Definition 3); and
- -
- : is its cluster head (identifier) when it exists.▪
- d = is the corresponding device
- : is the set of devices managed by the device.
- : is the covered zone of the entire devices. ▪
4.2. Zones and Environment
- is the zone identifier;
- is the surface of the zone;
- is the spatial shape of the zone; and
- L: is the set of location stamps that constitute the vertices of the zone. ▪
- is the environment identifier;
- is the set of cluster heads in the environment;
- is the set of uncovered zones in the environment; and
- L are the two corresponding vertices of the environment. ▪
5. Proposed Approach
5.1. Pre-Clustering
- -
- Environment division following physical communication: commonly, devices are grouped based on their physical communication connections. In other words, each group of devices, which can communicate with each other using direct or multi-hop links, is gathered together. This ensures an overlay connection between the devices (where the algorithm execution must occur). IoT communication technologies (such as Wi-Fi and Bluetooth) enable direct communication, allowing nearby IoT devices to easily interact, thereby simplifying the task of this module. In Ref. [31], the authors considered the communication range as two times the coverage range. Other approaches, such as in [32], rely on connecting the devices from the beginning; hence, each device knows its directly connected devices. In our approach, we assume that each group of devices is aware of its directly or indirectly connected neighbors (using any network discovery protocol), which allows smooth inner connection.
- -
- Coverage area conversion: following [31], sensing models are the representation of sensing capabilities and quality. They rely on the sensing method: (1) directional sensing that depends on the distance and the horizontal orientation of the sensor or (2) omnidirectional sensing that refers to devices that can capture a 360-degree view of the surrounding scene (i.e., determines if a point is within the sensor’s radius). Several sensing models can be distinguished, but mainly two: Boolean and probabilistic. The Boolean sensing model is one of the most used according to [31]. It consists of considering each sensor node to have a binary sensing capability within a specific radius; i.e., it can detect the presence or absence of a target. The probabilistic sensing model extends the Boolean sensing model to better reflect modern connected environments. It relies on the probability of detecting a point within the coverage area. Even if a point is within this area, the detected value might have low accuracy or be undetectable. In our approach, the probabilistic sensing model is adopted since it includes the Boolean and provides more realism. In this study, we only focused on omnidirectional sensing. To reduce computation complexity in our approach, we transform the devices’ coverage zones (circles and lines) to either squares or rectangles and incorporate a probabilistic percentage named degradation percentage based on the sensor specifications and environmental conditions.To ease the illustration of the coverage, let us consider Figure 3. There are two ways to represent an omnidirectional device coverage: one method uses a square with a side equal to (case (a)), and the other represents the side of the square by (case (b)). In case (a), some of the coverage area exceeds the coverage range of the sensor (gaining a small portion from the coverage range), while in case (b), the coverage area is totally included in the coverage range while losing a small portion of the coverage range. The degradation percentage (DP) directly impacts the coverage range, as demonstrated in cases (c) and (d). As the DP goes up, the coverage range area will be shortened. The method for representing the coverage area with the DP depends on the sensor specifications and precision.
- -
- Device grouping: After generating the coverage zones of each device, we check for continuous intersections between them. Devices having a continuous intersection will be added to the same group. At the end of this step, the devices can communicate with each other and have consecutive covered zone intersections. The clustering algorithm will be applied independently to each group.
5.2. Clustering Algorithm Execution
- -
- Sensor metadata normalization: all numerical values are normalized using the following min–max normalization technique [33]:
- -
- Coverage zone clustering algorithm execution: the clustering algorithm is executed for each device group. Details about the execution steps will be presented in Section 6. The clustering process will result in distinct, non-overlapping covered zones, each with a designated cluster head. It is important to note that any grouping, division, or modification of a device’s coverage area will create a covered zone.
5.3. Post-Clustering
- -
- Uncovered area calculation using ENV and CZ: In this step, we calculate empty areas by subtracting the entire connected environment from the covered zones. As a result, we will have areas that are not covered by any sensing device.
- -
- Uncovered zone division using internal vertices: Internal vertices are those located within the boundaries of the connected environment but not on the edges of the environment’s Minimum Bounding Rectangle. For each internal vertex, we draw a horizontal line dividing the current uncovered area into two parts. After splitting all internal vertices, the uncovered zones will be rectangular. This step aims to reduce the storage capacity required on the device’s local storage, as only two points are needed to represent a rectangle in memory. More information about this part will be given in Section 7.
6. RDSC Clustering Algorithm
6.1. Equations and Applications
6.1.1. Objective Function
6.1.2. Use Cases
Algorithm 1: runUseCases() | |
Input: | |
Output: //object containing use cases results | |
Local Variables: | |
//Calculating power values | |
1 | |
2 | |
3 | = |
4 | = |
//merged use case | |
5 | = ‘merged’ |
6 | = |
7 | = |
8 | = |
9 | = |
10 | |
//dominant one use case | |
11 | = ‘dominantOne’ |
12 | = − |
13 | = |
14 | = + |
15 | |
//dominant zone two use case | |
16 | = ‘dominantTwo’ |
17 | = − |
18 | = |
19 | = + |
20 | |
//shrink zone one use case | |
21 | = ‘shrinkOne’ |
22 | = − |
23 | = − * deletionR |
24 | |
//shrink zone two use case | |
25 | = ‘shrinkTwo’ |
26 | = − |
27 | = − * deletionR |
28 | |
29 | return useCasesObj |
Algorithm 2: spatialClustering() |
6.2. Main Algorithm Execution
7. Uncovered Zones Division
Algorithm 3: generateUncoveredZones() |
Input : Output : // the environment object 1 = 2 = 3 4 5 return env |
8. Experiments
8.1. Performance Evaluation
8.1.1. Effect of the Surface Weight on the Clustering Results
8.1.2. Effect of the Power Weight on the Clustering Results
8.1.3. Merge Factor Impact
8.2. 1000 Device Execution
- Surface weight: 0.4;
- Power weight: 0.4;
- Vertex weight: 0.2;
- Merge factor: 3.5; and
- Deletion rate: 0.3.
8.3. Algorithms Comparison
8.4. Discussion
9. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Kashani, M.H.; Madanipour, M.; Nikravan, M.; Asghari, P.; Mahdipour, E. A systematic review of IoT in healthcare: Applications, techniques, and trends. J. Netw. Comput. Appl. 2021, 192, 103164. [Google Scholar] [CrossRef]
- Miraftabzadeh, S.; Colombo, C.; Longo, M.; Foiadelli, F. K-means and alternative clustering methods in modern power systems. IEEE Access 2023, 11, 119596–119633. [Google Scholar] [CrossRef]
- Li, W.; Chen, S.; Peng, X.; Xiao, M.; Gao, L.; Garg, A.; Bao, N. A comprehensive approach for the clustering of similar-performance cells for the design of a lithium-ion battery module for electric vehicles. Engineering 2019, 5, 795–802. [Google Scholar] [CrossRef]
- Yu, X.; Ergan, S. Estimating power demand shaving capacity of buildings on an urban scale using extracted demand response profiles through machine learning models. Appl. Energy 2022, 310, 118579. [Google Scholar] [CrossRef]
- Ding, Y.; Wang, X.; Zhang, D.; Wang, X.; Yang, L.; Pu, T. Research on key node identification scheme for power system considering malicious data attacks. Energy Rep. 2021, 7, 1289–1296. [Google Scholar] [CrossRef]
- Krishnan, G.C.; Nishan, A.h.; Theerthagiri, P. K-means clustering based energy and trust management routing algorithm for mobile ad-hoc networks. Int. J. Commun. Syst. 2022, 35, e5138. [Google Scholar]
- Lai, C.S.; Jia, Y.; McCulloch, M.D.; Xu, Z. Daily clearness index profiles cluster analysis for photovoltaic system. IEEE Trans. Ind. Inform. 2017, 13, 2322–2332. [Google Scholar] [CrossRef]
- Sun, M.; Konstantelos, I.; Strbac, G. C-vine copula mixture model for clustering of residential electrical load pattern data. IEEE Trans. Power Syst. 2016, 32, 2382–2393. [Google Scholar] [CrossRef]
- Zhang, L.; Wan, L.; Xiao, Y.; Li, S.; Zhu, C. Anomaly Detection method of Smart Meters data based on GMM-LDA clustering feature Learning and PSO Support Vector Machine. In Proceedings of the 2019 IEEE Sustainable Power and Energy Conference (ISPEC), Beijing, China, 21–23 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2407–2412. [Google Scholar]
- Diez, M.; Burget, L. Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: Theory, implementation and analysis on standard tasks, technical report. arXiv 2024, arXiv:2012.14952. [Google Scholar]
- Wade, S.; Ghahramani, Z. Bayesian cluster analysis: Point estimation and credible balls (with discussion). Bayesian Anal. 2018, 13, 559–626. [Google Scholar] [CrossRef]
- Wang, S.; Sun, X.; Lall, U. A hierarchical Bayesian regression model for predicting summer residential electricity demand across the USA. Energy 2017, 140, 601–611. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, C.; Yang, Y.; Yang, Y.; Ji, T.; Wang, J.; Chen, J.; Zheng, Y. Electricity market customer segmentation based on DBSCAN and k-Means:—A case on yunnan electricity market. In Proceedings of the 2020 Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 29–31 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 869–874. [Google Scholar]
- Kiran, D.; Abhyankar, A.; Panigrahi, B. Hierarchical clustering based zone formation in power networks. In Proceedings of the 2016 National Power Systems Conference (NPSC), Bhubaneswar, India, 19–21 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Barmpas, P.; Tasoulis, S.; Vrahatis, A.G.; Georgakopoulos, S.V.; Anagnostou, P.; Prina, M.; Ayuso-Mateos, J.L.; Bickenbach, J.; Bayes, I.; Bobak, M.; et al. A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies: The ATHLOS project. Health Inf. Sci. Syst. 2022, 10, 6. [Google Scholar] [CrossRef] [PubMed]
- AlMahamid, F.; Grolinger, K. Agglomerative Hierarchical Clustering with Dynamic Time Warping for Household Load Curve Clustering. In Proceedings of the 2022 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Halifax, NS, Canada, 18–20 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 241–247. [Google Scholar]
- Li, K.; Ma, Z.; Robinson, D.; Ma, J. Identification of typical building daily electricity usage profiles using Gaussian mixture model-based clustering and hierarchical clustering. Appl. Energy 2018, 231, 331–342. [Google Scholar] [CrossRef]
- Sheng, W.; Liu, K.Y.; Liu, Y.; Meng, X.; Li, Y. Optimal placement and sizing of distributed generation via an improved nondominated sorting genetic algorithm II. IEEE Trans. Power Deliv. 2014, 30, 569–578. [Google Scholar] [CrossRef]
- Ruha, L.; Lähderanta, T.; Lovén, L.; Kuismin, M.; Leppänen, T.; Riekki, J.; Sillanpää, M.J. Capacitated spatial clustering with multiple constraints and attributes. arXiv 2020, arXiv:2010.06333. [Google Scholar]
- El-Sharkawi, M.E.; El-Zawawy, M.A. Algorithm for spatial clustering with obstacles. arXiv 2009, arXiv:0909.4412. [Google Scholar]
- Saif, A.; Dimyati, K.; Noordin, K.A.; Shah, N.S.M.; Alsamhi, S.; Abdullah, Q.; Farah, N. Distributed clustering for user devices under UAV coverage area during disaster recovery. In Proceedings of the 2021 IEEE International Conference in Power Engineering Application (ICPEA), Shah Alam, Malaysia, 8–9 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 143–148. [Google Scholar]
- Mukherjee, A.; Goswami, P.; Yang, L.; Yan, Z.; Daneshmand, M. Dynamic clustering method based on power demand and information volume for intelligent and green IoT. Comput. Commun. 2020, 152, 119–125. [Google Scholar] [CrossRef]
- Lin, Y.; Zhang, R.; Yang, L.; Li, C.; Hanzo, L. User-centric clustering for designing ultradense networks: Architecture, objective functions, and design guidelines. IEEE Veh. Technol. Mag. 2019, 14, 107–114. [Google Scholar] [CrossRef]
- Basavaraj, G.; Jaidhar, C. Intersecting Sensor Range Cluster-based Routing Algorithm for Enhancing Energy in WSN. Int. J. Adv. Netw. Appl. 2019, 10, 3938–3943. [Google Scholar]
- Seema, B.; Yao, N.; Carie, A.; Shah, S.B.H. Efficient data transfer in clustered IoT network with cooperative member nodes. Multimed. Tools Appl. 2020, 79, 34241–34251. [Google Scholar] [CrossRef]
- Rehman, M.A.U.; Ullah, R.; Kim, B.S.; Nour, B.; Mastorakis, S. CCIC-WSN: An architecture for single-channel cluster-based information-centric wireless sensor networks. IEEE Internet Things J. 2020, 8, 7661–7675. [Google Scholar] [CrossRef]
- Rehman, M.A.U.; Ullah, R.; Park, C.W.; Kim, D.H.; Kim, B.s. Improving resource-constrained IoT device lifetimes by mitigating redundant transmissions across heterogeneous wireless multimedia of things. Digit. Commun. Netw. 2022, 8, 778–790. [Google Scholar]
- Essalhi, S.E.; Raiss El Fenni, M.; Chafnaji, H. A new clustering-based optimised energy approach for fog-enabled IoT networks. IET Netw. 2023, 12, 155–166. [Google Scholar] [CrossRef]
- Frigui, H.; Krishnapuram, R. Clustering by competitive agglomeration. Pattern Recognit. 1997, 30, 1109–1119. [Google Scholar] [CrossRef]
- Achkouty, F.; Chbeir, R.; Gallon, L.; Mansour, E.; Corral, A. Resource Indexing and Querying in Large Connected Environments. Future Internet 2023, 16, 15. [Google Scholar] [CrossRef]
- Elhabyan, R.; Shi, W.; St-Hilaire, M. Coverage protocols for wireless sensor networks: Review and future directions. J. Commun. Netw. 2019, 21, 45–60. [Google Scholar] [CrossRef]
- Dargie, W.; Wen, J. A simple clustering strategy for wireless sensor networks. IEEE Sens. Lett. 2020, 4, 7500804. [Google Scholar] [CrossRef]
- Codecademy. Normalization. 2024. Available online: https://www.codecademy.com/article/normalization (accessed on 1 July 2024).
Contribution | Coverage Range | Energy/Power | Device Capacity |
---|---|---|---|
Essalhi et al. [28] | - | ✓ | ✓ |
Ruha et al. [19] | - | - | ✓ |
Saif et al. [21] | - | ✓ | - |
Mukherjee et al. [22] | - | ✓ | - |
Basavaraj et al. [24] | ✓ | ✓ | - |
El-Sharkawi et al. [20] | ✓ | - | - |
Lin et al. [23] | - | ✓ | - |
Rehman et al. [26,27] | - | ✓ | ✓ |
Our approach | ✓ | ✓ | ✓ |
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
0.9 | 0.75 | 0.5 | |
0 | 0 | 0 | |
0.1 | 0.25 | 0.5 | |
3.5 | 3.5 | 3.5 | |
0.3 | 0.3 | 0.3 |
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
0 | 0 | 0 | |
0.9 | 0.75 | 0.5 | |
0.1 | 0.25 | 0.5 | |
3.5 | 3.5 | 3.5 | |
0.3 | 0.3 | 0.3 |
Case 1 | Case 2 | Case 3 | Case 4 | Case 5 | Case 6 | Case 7 | |
---|---|---|---|---|---|---|---|
0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | 0.4 | |
0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | |
0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | |
3 | 5 | 10 | 11 | 13 | 15 | 20 | |
0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
RDSC | DBSCAN | K-Means |
---|---|---|
Surface weight: 0.3 | EPS: 10 | K: 5 |
Power weight: 0.4 | Min samples: 3 | - |
Vertices weight: 0.3 | ||
Merge factor: 3.5 | ||
Deletion rate: 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Achkouty, F.; Gallon, L.; Chbeir, R. RDSC: Range-Based Device Spatial Clustering for IoT Networks. Sensors 2024, 24, 5851. https://doi.org/10.3390/s24175851
Achkouty F, Gallon L, Chbeir R. RDSC: Range-Based Device Spatial Clustering for IoT Networks. Sensors. 2024; 24(17):5851. https://doi.org/10.3390/s24175851
Chicago/Turabian StyleAchkouty, Fouad, Laurent Gallon, and Richard Chbeir. 2024. "RDSC: Range-Based Device Spatial Clustering for IoT Networks" Sensors 24, no. 17: 5851. https://doi.org/10.3390/s24175851