Controlling the deployment of virtual machines on clusters and clouds for scientific computing in CBRAIN

T Glatard, ME Rousseau, P Rioux… - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org
T Glatard, ME Rousseau, P Rioux, R Adalat, AC Evans
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and …, 2014ieeexplore.ieee.org
The emergence of hardware virtualization, notably exploited by cloud infrastructures, led to a
paradigm shift in distributed computing by enabling complete software customization and
elastic scaling of resources. However, new software architectures and deployment
algorithms are still required to fully exploit virtualization in web platforms used for scientific
computing, commonly called science gateways. We propose a software architecture and an
algorithm to enable and optimize the deployment of virtual machines on clusters and clouds …
The emergence of hardware virtualization, notably exploited by cloud infrastructures, led to a paradigm shift in distributed computing by enabling complete software customization and elastic scaling of resources. However, new software architectures and deployment algorithms are still required to fully exploit virtualization in web platforms used for scientific computing, commonly called science gateways. We propose a software architecture and an algorithm to enable and optimize the deployment of virtual machines on clusters and clouds in science gateways. Our architecture is based on 3 design principles: (i) separation between resource provisioning and task scheduling (ii) encapsulation of VMs in regular computing tasks (iii) association of a virtual computing site to each disk image. Our algorithm submits and removes VMs on clusters and clouds based on the current system workload, the number of available job slots in active VMs, the cost and current performance of clouds clusters, and a parameter quantifying the performance-cost trade-off. To cope with variable queuing and booting times, it replicates VMs on independent computing sites selected from a minimization of a make span-cost linear combination in the Pareto set of non-dominated solutions. Make span and cost are estimated from the last measured queuing, booting, and task execution times, using an exponential model of the gain yielded by VM replication. We implement this algorithm in CBRAIN, a science gateway widely used for neuroimaging, and we evaluate it on an infrastructure of 2 clusters and 1 cloud. Results show that it is able to reach some points of the performance-cost trade-off associated to VM deployment.
ieeexplore.ieee.org
Showing the best result for this search. See all results