1. Introduction
Microgrid hierarchical control aims to regulate the network frequency and voltage through collaborative work between distributed energy sources. This strategy has changed the impact of distributed generation. However, it has imposed several challenges in power control systems, especially those related to integrating power electronics, telecommunications, fault monitoring, and security issues [
1]. The trend of this strategy is to use three levels of hierarchical control to standardize the microgrid’s operation and increase its resilience.
The primary level regulates the network’s frequency and voltage, ensuring power-sharing between the distributed generators (DGs) [
2]. The most important approach is the droop control [
3], which is based on the droop features of a conventional generator. Droop control ensures stability between frequency or active power (
) and voltage or reactive power (
). The secondary level stabilizes the voltage and frequency deviations due to the output impedance decoupling, failures in power-sharing or the presence of circulating currents [
4]. At this level, sharing information in real time between the DGs and supporting the stability of the most critical control variables within their nominal values is essential.
The literature provides evidence that integrates the power system with communications architectures in networked microgrids (NMG). For example, refs. [
5,
6,
7,
8] proposes a two-level hierarchical control capable of regulating the active and reactive power sharing. However, despite using a low-bandwidth communication system, the topology needed to be more scalable due to the recent development of complex control algorithms. Previously reported research modified the communications architecture and control approach to add new facility units and control algorithms [
9]. Furthermore, conventional communications and routing protocols cannot handle multiple topology events and lack sufficient intelligence to make appropriate control decisions [
10].
Software Defined Networking (SDN) is a modern approach to networking that allows for controlling and managing network resources and traffic flow through software abstractions [
11]. SDN provides greater flexibility and scalability by decoupling the control and data planes and enabling network administrators to configure and manage the topology behavior. This leads to faster deployment times, improved network visibility, and easier troubleshooting [
12]. Additionally, it provides new opportunities for network automation, real-time analytics, and increased security through granular policy enforcement. SDN also promotes an open and interoperable network environment, enabling organizations to utilize various vendor offerings and technology advancements.
The main limitation of SDN technology is the use of a centralized control plane and the inconveniences associated with a single point of failure of this critical node. The centralized control plane can potentially lead to network downtime if not properly managed. In [
13,
14,
15], the authors propose a set of systems that aim to solve the drawbacks of conventional SDN technology. However, using a reliable communications strategy is still necessary to improve the heterogeneous communication between different DGs according to the restrictions defined by the architecture and control algorithms. On the other hand, the complexity associated with multiple programming APIs within a monolithic controller requires a high level of technical expertise and represents a significant challenge to network operations. Despite these limitations, different researchers and organizations are allocating resources toward modernizing their power systems and communication networks.
One promising approach is to distribute the functions of the SDN controller into multiple microservices [
16,
17]. The goal is to implement these functions as distributed units with a replication factor for redundancy. By doing so, the workload and resources of the SDN controller are spread among different worker nodes. In the event of a failure, the data remains accessible through the microservice controller without requiring manual intervention.
There is a wide variety of SDN controllers developed using different programming languages [
18,
19], among which Ryu [
20], Opendaylight [
21], and ONOS [
22] being the most significant due to their capabilities, ease of deployment, and reliability. A study in [
17] explored the development of a microservices-based system for the Ryu controller within the OpenStack virtualized network infrastructure [
23]. However, despite dividing the main functionalities of the Ryu controller into Docker containers, these microservices cannot automatically scale in case of failure. In the case of a worker node failure, there is no automatic method to restore the functions assigned to that element, impacting the communication system and the microgrid’s performance. Although this approach offers a fresh perspective on deploying SDN controllers, it must be noted that Ryu was not originally designed to support microservices [
24]. The previous drawback requires careful testing of the architecture, services, and communication interfaces before implementation in real-world environments.
is an SDN controller that uses microservices in large-scale communication networks [
25]. The ONOS Project presents
as a solution for disaggregating the functionalities of the SDN controller into microservices. This project aims to enhance the versatility of the SDN architecture by targeting cloud computing platforms, data centers, and bare-metal deployments. A bare-metal cluster removes the hypervisor overhead and puts the Kubernetes installation directly on the host server’s operating system.
An orchestrator that automates network management, load balancing, and new instances provides an intelligent topology that uses microgrid resources efficiently. Combining SDN with Kubernetes provides an intelligent system that analyzes traffic and application requirements in real-time to adjust network configuration and application deployment. Recent literature [
26,
27] suggests that although there are differences between this proposal and artificial intelligence (AI), the communication alternative can also be classified as intelligent because of SDN’s programmability and the system’s capacity to optimize network performance. Kubernetes manages containerized applications’ orchestration and deployment, while SDN manages the network infrastructure supporting those applications. Together, they can dynamically respond to changes in traffic, adjust network policies, and optimize network performance. For instance, when traffic surges, SDN can automatically increase network resources by adding nodes or expanding bandwidth. Likewise, if an application needs to be migrated, SDN can adapt the network routing to reduce recovery time. Compared to traditional networking, which can be time-consuming and error-prone, our proposal offers automated configuration and management that is not restricted to specific conditions [
28].
According to literature [
29,
30], the restrictions in the communications systems imposed by the penetration of more distributed generation sources force the use of distributed, autonomous and efficient communications strategies to control the power system efficiently. Microgrids need bare-metal Kubernetes to ensure high reliability, low latency, and optimal resource utilization [
31], which are critical requirements for a microgrid’s efficient and safe operation. Standby servers may provide low reliability and performance than a Kubernetes-based infrastructure, especially in dynamic and changing load conditions. There are several reasons to propose bare-metal Kubernetes for microgrids:
Reliability [
32]: Bare-metal Kubernetes provides a highly reliable infrastructure for microgrids by distributing the workloads across multiple nodes and ensuring the high availability of resources.
Computational cost [
33]: Low costs because virtualization software is no longer necessary. Cluster automation and microservices deployment are straightforward because there is no hypervisor.
Low latency [
34]: Microgrids require low latency and high-speed communication between the devices to ensure safe and efficient operations. Bare-metal Kubernetes provides low-latency network connectivity and efficient data communication.
Optimal resource utilization [
33,
35]: Microgrids require optimal resource utilization to ensure energy efficiency and reduce operational costs. Bare-metal Kubernetes provides efficient resource allocation and utilization, which can help optimize energy consumption and reduce costs.
Scalability [
36]: Network configuration is more straightforward on the bare-metal cluster and troubleshooting. Microgrids require the ability to scale up or down depending on the demand. Bare-metal Kubernetes provides automatic scaling and load balancing, which can help ensure optimal performance under varying load conditions.
Despite the previous comments and the benefits of Kubernetes, there is no evidence of SDN microservices being used to solve the disadvantages of microgrid hierarchical control. Decoupling the applications from the monolithic controller into a series of sub-functions enables the deployment of a highly flexible SDN architecture. The most critical impact of this research is the ability to coordinate different applications as microservices and provide the guidelines for programming APIs in microgrid hierarchical control.
This paper proposes a novel hierarchical control architecture based on microservices to address the limitations of SDN controllers in networked microgrids. Implementing a set of controllers as a distributed system in the Kubernetes bare-metal cluster increases the redundancy and resilience of the communication system. The load is distributed among various devices based on communication and power system restrictions. Rather than using a monolithic controller, the controller functions are divided into a group of microservices. The key contributions of this research are presented as follows.
A new architecture, based on microservices, as a solution to the centralized SDN controller problem regarding load balancing, scalability, and low latency. The proposed methods improve the global resilience of the system and allow the integration of SDN controllers as pod services in distributed Kubernetes platforms. The proposed approach allows the deployment of bare-metal Kubernetes cluster parameters and can be applied to multiple configurations of AC/DC microgrids.
A new SDN communication architecture has been developed for hardware-in-the-loop platforms connected to Raspberry pi, serving as both a Kubernetes worker and an OpenFlow communication device. Furthermore, this paper analyzes the most significant drawbacks of the SDN control plane in networked microgrids.
Provides a proof of concept to apply for segregating and orchestrating services in bare-metal Kubernetes cluster. The proposed method decreases the data flow traffic through the SDN infrastructure, setting the most appropriate route between the DGs. The distributed communication system is capable of managing real-time energy data.
This implementation can be replicated and modified through our project’s GitHub repository. Furthermore, it designs a monitoring tool integration that allows visualization of logs and measures metrics to carry out a complete analysis of the networked microgrid.
The rest of this paper is organized as follows.
Section 2 reviews the main limitations of SDN controllers according to physical architecture, interfaces, reliability, and scalability.
Section 3 describes the development of hierarchical control of microgrids and the control method applied.
Section 4 provides the microservices benefits of SDN controller functionalities disaggregation and our methodology. The implementation of the SDN controller as a group of microservices is presented in detail in
Section 5. The results and discussion of the significance of our proposal are given in
Section 6, according to different metrics. On the other hand,
Section 7 presents different communication failures and compares the performance of the communication systems.
Section 8 concludes the work and provides an overview of future research topics.
3. Hierarchical Control Approach
Hierarchical control is a practical approach to managing power sharing in a Microgrid. The primary control level determines the amount of power to be generated by each DG based on factors such as power demand, availability of energy, and system constraints, as
Figure 1 shows. The secondary control level manages the power-sharing between the Microgrid and the utility grid, ensuring that the Microgrid operates within acceptable limits. System optimization and long-term planning are carried out at the tertiary control level to ensure optimal power sharing among the energy sources, energy storage systems, and loads.
Power sharing between DGs is usually carried out by parallel inverters connected to a common AC bus with multiple loads [
42]. The proposed controller in this paper considers the microgrid as three Voltage Source Inverters (VSI) feeding an RL load at the point of common coupling. An RL load was chosen to study the microgrid’s active and reactive power sharing. Each DG represents a power source implemented in The Simulation Platform for Power Electronic Systems (PLECS) [
43,
44]. Droop control in parallel inverters is a widely used strategy to regulate MG power sharing. The main goal of primary droop control is to set a proportional load sharing among DGs, based on the well-known (P-Q) droop method [
5]. Each inverter has an external droop control loop to improve performance and provide a decentralized control method.
Figure 2 shows the strategy applied to regulate frequency and voltage for one VSI. The details of hierarchical control can be found in
Appendix A and
Appendix B.
The control strategy is evaluated according to different events within the microgrid. The droop control is activated during the first second until it reaches the steady state condition. As shown in
Figure 3 and
Figure 4, there is a deviation in voltage and frequency that needs to be solved by the secondary control. For that reason, it is necessary to perform secondary control to reach the stability of the MG. The secondary control is activated after 10 seconds and remains until the end of the simulation.
Research studies have used a hierarchical control approach with an SDN-based communication architecture to enhance overall system intelligence [
13,
37,
45]. However, while this approach addresses some aspects of power sharing in networked microgrids, the communication system remains a critical factor that impacts the power system and hinders overall recovery. The monolithic architecture of the SDN controller, the integration of new control functionalities, and the excessive workload can be improved through a microservices-based architecture and automatic orchestration.
5. Implementation of μONOS SDN Controller
Microgrids are changing into more complex and extensive networks in which applications, service virtualization, and edge computing are highly related to control strategies [
29,
52]. Micro ONOS (
μONOS) is a new version of the SDN controller, developed by the Open Network Foundation (ONF [
53]), which uses microservices to deploy a scalable infrastructure with excellent throughput and low latency. It is based on Docker containers deployed in a cloud-based infrastructure or local data centers. Unlike monolithic controllers that integrate multiple APIs, it comprises a few interfaces such as Google Remote Procedure Call (gRPC), gRPC Network Management Interface (gNMI) and P4Runtime [
25].
The
infrastructure, network functions and monitoring services only support Kubernetes applications through Helmcharts implementation [
54]. To install, manage and delete the device configurations and their performance,
configures the gNMI as an open-source data management protocol for network devices. The functionalities provided by gNMI can be modeled using YANG (Yet Another Next Generation data modeling language) [
25]. The network manager interacts with the SDN controller through the gRPC interface, accessing both the
onos-cli and
onos-gui services for network management, network modifications and reverting changes. According to ONF [
53], the
onos-config service offers a gNMI endpoint for various functions, including reading states, configuring settings, and subscribing to specific features. This interface can also prevent invalid values and enable the operational state.
The
identification component facilitates the controller’s management from external applications, as depicted in the top layer of
Figure 7. A proposed middleware enables information exchange between external applications (power-sharing information from networked MG) and the SDN control plane. As shown in
Figure 7, the core of the SDN controller is separated from the event handler within the middleware, allowing communication between the REST API and the controller microservices. This service, named
onos-config, is responsible for linking the module’s functionality to the specific microservices that are executed in a distributed manner.
The essential microservice of
Figure 7 is the block
onos-config, as it manages the configuration of devices through gNMI interfaces and registers all events to send them to the Atomix driver. The Atomix driver implements an API to scale the
Kubernetes resources. This property of scaling resources adds greater redundancy to the controller. It provides a system with distributed additional resources and is responsible for maintaining and managing the
service. Atomix controller details can be found in the GitHub repo of the Open Networking Foundation [
55].
5.1. Functionalities of onos-config Module
The onos-config module strives to handle network and device alterations by invoking the NetworkChange and DeviceChange services, respectively. These services keep records of all change logs and pass them to the Atomix drivers via the gRPC interface.
For connecting devices through the southbound interface,
onos-config only supports it through gNMI. Furthermore, the YANG models set the configuration and topology architecture into
onos-config service. The deployment of this module requires a Kubernetes cluster capable of running Helmcharts as detailed in its deployment page [
56]. The interaction between
onos-config and
onos-cli allows the user to define rules or route paths through the gNMI interface. To execute it, access the
onos-cli pod and run the plugins from there (you can view the list of plugins by running
onos-config get plugins).
Figure 8 shows the deployment of the pods during the execution of the
in the topology. Five pods of the
onos-config service are deployed, and six pods of Atomix increase the availability and distribute the services across the nodes. All steps to deploy this proposal are available in the GitHub repository [
57].
5.2. Network Interface Cluster Implementation
This paper implements two network interfaces for each Raspberry, as shown in
Figure 9. The
eth0 is the default ethernet interface of the Raspberry pi 4. It serves as the Linux bridge and is used for configuring Kubernetes clusters, exchanging pods control information, accessing Rancher and the load balancer, and connecting to the Internet gateway. It means that
eth0 acts as an overlay interface for external communications. On the other hand,
eth1 is used by the OpenFlow protocol to communicate between SDN service pods. Conversely, the
eth1 interface, connected via a USB to Ethernet adapter, enables the substitution of the
veth interfaces of the pods with
kbr-int-ex. Each pod in the cluster will use the
kbr interface instead of its
veth to avoid routing and forwarding issues.
Calico CNI (Kubernetes Container Network Interface (CNI)) deploys a Daemonset on each node, ensuring communication between gRPC and the controller. In other words, this plugin provides networking for the containers and pods within the Kubernetes cluster. The cluster formed by the Raspberry nodes can operate as conventional Ethernet devices (via TCP and Unix domain socket) or as an OpenFlow switch to handle SDN traffic.
5.3. Create the Kubernetes Cluster on Raspberry
K3s distribution [
49] allows deploying of the bare-metal cluster and is an excellent choice for IoT devices, particularly Raspberry pi. From Rancher’s official documentation [
58], a high-availability cluster with an embedded database is implemented. However, using a single load balancer as implemented in the default configuration brings back the drawbacks of the centralized system. Integrating the K3s cluster with Keepalived and HAproxy as described in [
59], solves the previous disadvantages in a distributed way. To set up an HA Kubernetes cluster using Keepalived and HAproxy, you need to install and configure Keepalived and HAproxy on each node in the cluster. Keepalived is used to manage the virtual IP address that clients use to access the Kubernetes API. In contrast, HAproxy is used to load balance incoming traffic across the Kubernetes API server nodes. This alternative is superior to the external database implementation because the storage is distributed in each etcd node’s services [
58], increasing the system’s availability and removing the single point of failure through distributed storage.
Ansible automates node creation, storage configuration, network interfaces, services, and deployment processes. This tool allows (in a simple way) the cluster to be deployed through the execution of a series of scripts developed in Ansible. All the manifest and Ansible inventory files are in the shared GitHub repo.
5.4. Connection to PLECS RT Box
Possible communication alternatives between Raspberry and PLECS are SPI, I2C, and CAN protocols. However, the PLECS RT Box only has two SPI communication modules (SPI1 and SPI2), leaving one of our worker nodes unconnected. In [
60], the average data rates of the three technologies are compared. According to the PLECS manual, SPI has the best data rate, followed by the CAN bus and I2C. We decided to connect the third node of the cluster to the same SPI port of node 2’s PLECS server for these reasons. Although this is not the best communication alternative, as the frequency and voltage values on each Raspberry are averaged, this will not affect the overall performance of the MG. For more complex microgrid implementations, it is recommended to use the SFP transceiver modules, available from PLECS, to achieve speeds of up to 10Gbps.
Each Raspberry will serve as an OpenFlow communication device that ensures the exchange of secondary control information.
Figure 10 shows the practical implementation of this proposal. There are three Raspberrys connected through Ethernet and USB. The first connection allows external access, and the second provides SDN functionalities. Additionally, the PLECS platform is wired to the cluster through the SPI bus of the Launchpad F28069M.
According to the price of this proposal,
Table 2 shows that it is possible to create a bare-metal cluster with less than 1300 USD. The Cisco router can be replaced by other cheaper alternatives, such as the ZodiacFX [
61].
5.5. Monitoring Platform
Different factors, such as hardware resources, load, or communication architecture, limit the number of pods executed on each device. However, with Rancher, these pods can be scaled automatically without compromising the cluster’s overall structure. To configure automatic scaling in Rancher is necessary to select the deployment. From there, the “Scaling” tab specifies the minimum, maximum, and desired number of replicas for the deployment and also sets the scaling policy based on CPU or memory usage. This proposal shows the number of replicas for each service in
Figure 8, with the boundary for scaling new pods set at 80% of CPU usage. It’s important to note that automatic scaling requires a monitoring and metrics system to track your deployment’s resource utilization. Rancher integrates with Prometheus and Grafana to trigger alerts according to the threshold values configured. Finally, the Grafana graphical user interface allows monitoring the deployment’s status and the number of replicas scaled.
The solution to monitoring the communication devices’ status and the tools’ interaction is presented in
Figure 11. Our proposal uses a Rancher Helmchart to develop a Java application (Prometheus Exporter) to obtain information about network metrics. Furthermore, it exports the data to Prometheus, which is responsible for monitoring events and triggering alerts according to standard conditions.
Prometheus sends the information to the local storage to collect operating system metrics. We use the interface to obtain information and network metrics. Different notifications can be triggered through the Prometheus configuration file to perform fast reactions without downtime.
Finally, Grafana allows importing a series of dashboards with information on the DG flows. This tool obtains the knowledge of the packet flows that pass through the USB ethernet adapter to the SDN controller.
Figure 12 demonstrate the correct integration of Prometheus and Grafana within the Kubernetes cluster. Using Rancher in this proposal simplifies the deployment and configuration of the services.
An essential element to consider when implementing a high-availability Kubernetes cluster on Raspberry is the performance of the computational resources. As the cluster scales, the computational resources become more limited. Increasing the number of pods in Kubernetes can be done to minimize the impact on service performance as follows.
Ensure that the nodes in your cluster have enough resources (CPU, memory, storage) to support the increased number of pods. The notification system and the alerts configured in Grafana allow the monitoring of resource usage and global capacity.
Using horizontal pod autoscaling (HPA) automatically adjusts the number of pods based on resource usage and demand. HPA can be configured based on CPU usage, memory usage, or custom metrics.
To prevent resource contention and performance issues, pod anti-affinity rules ensure that pods are not placed on the same node. This method avoids the scheduling of pods on the same node.
Optimize pod resource requests and limits to function correctly.
Use pod disruption budgets (PDB) to ensure that a minimum number of pods are available during node maintenance or failures. By setting a PDB, you can guarantee that the service is unaffected by removing pods from the cluster.
Monitoring the service’s performance following the previous points and adjusting the settings as necessary to optimize resource utilization and performance is crucial.
7. Communication Failure and Recovery Test
Combining Kubernetes with a set of SDN microservices can improve application recovery time by eliminating the single point of failure in hierarchical control. In a traditional network architecture, the control plane and the data plane are tightly coupled, which means that a failure in the control plane can lead to significant disruptions in the network’s operation. This hierarchical control model has a single point of failure, which can be a bottleneck for recovery time.
However, with Kubernetes and SDN microservices, the control plane is decoupled from the data plane, and the control functions are distributed across the network. If a failure occurs in one part of the network, the rest can continue normally, and recovery time can be significantly reduced. Moreover, the combination of Kubernetes and SDN microservices offers a significantly automated and customizable network setting that enables quick and flexible network topology adjustments in response to any modifications in the infrastructure or application. This attribute further reduces the network reconfiguration time, hence enhancing the recovery period, which would otherwise require manual intervention. Combining Kubernetes with SDN microservices can improve application recovery time by eliminating the hierarchical control’s single point of failure and providing a highly automated and programmable network environment. This can help ensure that applications and services are always available and performing optimally, even during network failures.
A communication system failure (closer to the distributed generation sources) is simulated in this scenario. The objective is to verify which strategy has better performance from the point of view of communications without degrading the power-sharing between the local controllers of the MG.
The failure is generated, for instance, containing the SDN monolithic, and they are compared with the losses produced in one of the instances that include the microservices. The orchestrator is expected to be able to instantiate a new subsystem instance without degrading the performance of the MG. Kubernetes has been configured to be scaled automatically according to the research shown in [
62]. Furthermore, in
Figure 16, the packet loss percentage shows that microservices perform well due to distributed services above the nodes.
Figure 17 and
Figure 18 show the results obtained during power-sharing with and without hierarchical control, respectively. At one second, the droop control is started to distribute the active and reactive power-sharing. At 10 s, the hierarchical control is enabled to regulate voltage and frequency deviations. Since there is no message loss between controllers (minor glitches only in the average message delay), it proves the system’s robustness. However, the secondary control with a monolithic controller is highly susceptible to small latencies, CPU burden, and propagation delay. In this scenario, one of the OpenFlow switches was removed to determine the effects of the monolithic control strategy. This experiment should be understood as a way to highlight the robustness achieved by the SDN system in its deployment based on microservices.
Our proposal uses a proactive approach to latency testing and a reactive approach to manage topology changes. The architecture will react immediately if a new distributed generation source is added, allowing proper power sharing, as
Figure 3 shows. On the other hand, the orchestrator can add intelligence to the topology according to its appropriate programming. For example, our monitoring system architecture can detect latency increasing or node congestion and take the necessary actions to reduce the impact and the consequences. For example, it can scale a more significant number of instances as microservices and thus simultaneously serve more communication requests.
8. Conclusions
Monolithic controllers have several drawbacks concerning scalability and reliability. In most cases, these architectures do not meet the requirements of fault tolerance and rapid adaptability, which are imposed by networked microgrids. This proposal’s most significant contribution was the property to dynamically reconfigure control flows based on a microservices architecture and the automatic deployment of microservices instances.
Using a bare-metal Kubernetes cluster on a Raspberry pi and deploying distributed microservices allowed us to improve the reality of a distributed energy control system. Microservices testbeds demonstrated the SDN controllers’ rapid deployment, portability, high availability and resiliency to application failures.
The Kubernetes orchestrator provided good scalability of the communication system, as well as improved the fault tolerance and replication capacity. This is due to the high fault tolerance, capable of managing and distributing the load between microservices. From a comparative perspective, this proposal significantly improves failure recovery time and resilience concerning communications devices.
The API REST microservice topology allowed the splitting of the SDN controller’s core functionalities into small, well-defined functions. In all test case scenarios, reliability showed excellent behavior. Furthermore, the portability of all the nodes in the topology is possible due to the Docker containers. The ability to exchange control information between DGs over an SDN network allows them to regulate the system’s response and reach a steady state more quickly. The results show that to increase the resilience of the network, more sophisticated control strategies and highly available programmable communications networks are required. Finally, the monitoring architecture allows the export of logs in real-time and detects failures through notification software.