Efficient Energy Management in Virtualized Datacenter using Machine learning Techniques

DOI : 10.17577/IJERTV3IS061363

Download Full-Text PDF Cite this Publication

Text Only Version

Efficient Energy Management in Virtualized Datacenter using Machine learning Techniques

1Anasuya .N.Jadaerimath 2 Sathya Priya A, 3Dr. Anirban Basu

Dept of Computer Science and Engineering, East Point College of engineering and Technology,

Bangalore.

Abstract With the fast development of cloud computing, the data center is becoming larger in scale and consumes more energy. So, there is an emergency need to develop efficient energy saving method to reduce the huge energy consumption in Virtualized data center. In this paper we accomplish energy saving by N:1 mapping Virtualization technology that is employed to integrate many physical machines into a virtual resource pool to control resources centralization. We have applied reinforcement learning for resources management and decision making for an uncertain task flow datacenter. We have proposed resource control algorithm with energy consumption awareness for better energy saving. This algorithm is implemented in the CloudSim Platform. The experimental result shows that this algorithm can reduce around 42% of the energy consumption of the non power aware datacenter and reduce around 3% of that of the greedy scheduling algorithm in Virtualized datacenter.

Key wordscloud computing, Virtualized datacenter, Reinforcement erudition, Utilizations, CloudSim , Q- learning, Energy Management .

I.INTRODUCTION

Cloud computing is the concept of dynamically provisioning processing time and storage space from an ubiquitous cloud of computational resources which allow user to acquire and release the resources on demand and provide access to data by relegating the physical location and exact parameter of the resources. From user perspective, cloud computing means scalability on demand to meet business changes and easy to use and manage.

Datacenter is a centralized repository either physical or virtual for the storage and distribution of data. In Data

centre the information is organized around a particular body of knowledge or business. Today, user stores variety of data in cloud datacenter. Due to exponential increase of data flow, there is huge requirement of virtualized Data centers. With increase in datacenter obviously energy consumption will be more which leads to the increase in electricity cost and emission of more carbon dioxide, which will harm the ecosystem.

Despite of the improvement in energy efficiency of the hardware overall energy consumption continues to grow due to the increasing requirements of computing resources. for example, in the year 2010 datacenter has consumed 0.5% of world's total electricity usage and if the demand of energy continues is projected to quadruple by 2020[8].

Fig:1. Traditional Datacenter

In traditional datacenter (Fig:1.) hosts of the cluster is viewed as a single node to provide service for any specific task. So, it is difficult to satisfy the variegated demands of the task when the resource of single node is fixed. There is tremendous delay caused by the resources competition between loads and hosts. So, naturally in traditional datacenter we find more energy consumption.

Fig: 2. Virtualized Datacenter

In Virtualized data centers (VDC) virtual machine (VM) are created according to the task requirements, and all the virtual machines are managed by virtualization resources pool unit. In this way many virtual machines run in a single host that can increase the resource utilization of the host in the datacenter which consumes less energy and increase overall performance of the data centre

For energy conservation, the effect and importance of building a reasonable mapping from virtual machines to physical ones may be more obvious. The good mapping of VMs to Physical Machines not only improve the resource utilization but also can effectively reduce the virtual machine migration and lower the additional energy consumption. But, it is difficult to map virtual machines to its suitable physical machine and ensure energy saving at the same time as the loads are related to time. But uncertainly the VMs probability distribution characteristics and their demands change in real time. After allocating VMs to specific host which may be the local optimal, the energy consumption will be more as the resource utilization of the host is unreasonable.

Our target is to allocate global optimal host to VMs and ensure energy saving and performance. This paper, from the point of energy saving uses resource scheduling algorithm based on machine learning techniques, that means it uses reinforcement learning strategies to map VMs to hosts according to the real time demands of load.

  1. RELATED WORK

    Energy saving is one of the core research area in Cloud computing. Many researchers focus on reducing energy consumption of the datacenter infrastructures. One of the first work, in which power management has been applied at the datacenter level, has been done by Pinheiro et al. [13]. In this work the author has proposed a technique for minimization of power consumption in a heterogeneous cluster of computing nodes serving multiple web applications. The main technique applied to minimize

    power consumption is concentrating on the workload to the minimum of physical nodes and switching idle nodes off [16]. This approach requires dealing with the power / performance trade off as performance of application can be de-graded due to the workload consolidation. Requirements to the throughput and execution time of application are defined in SLAs to ensure reliable QoS. The algorithm periodically monitor the load of resource CPU, disk storage and network interface and makes decision on switching nodes on/off to minimize the overall power consumption while providing the expected performance. The actual load balancing is not handled by the system and has to be managed by the applications. The algorithm runs on a master node, which creates a Single Point of Failure and may become a performance bottleneck in large systems. In addition ,the author have pointed out that the reconfiguration operations are time consuming and the algorithm adds or removes only one node at a time, which may also be a reason for slow reaction in large scale environments. This approach can be applied to multi- application mixed workload environments with fixed SLAs.

    Chao Li Amer Qouneh, Tao LI proposed Characterizing and Analyzing Renewable Energy driven Data Center. This paper proposed an innovative idea that uses renewable energy such as solar power in datacenter. [4-5].

    Younge AJ & Von Laszewski G proposed efficient resources management for cloud computing environment. This paper proposed to use the virtual technology, such as energy consumption perception of virtual machine scheduling, and virtual machine transplantation [6-7].

    Hai zong Kun Lao & Xuejie proposed an approach to optimized Resources scheduling Algorithm for Open source Cloud systems. This paper proposed to adopt optimized genetic algorithm to improve the performance of the resources schedule on physical machines [8].There is also one research on the relationship between the energy consumption and performance of data centers and the resource utilization, they see the resource allocation like the knapsack problem, which owns some special attributes such as disk utilization, CPU utilization and so on [9].

    Shekhar proposed that energy is reduced by reducing the number of physical machines by the way of virtual machines scheduling. Relevant data [10] shows, direct relation exists between the performance of servers and the disk utilization that is with the increase of disk utilizationthe performance will reduce.

    In order to improve performance and service quality of data center in cloud, Xavier Grehant [11] used reinforcement learning in resource allocation, and

    designed an resource supply system which mainly be used to schedule virtual machines and workloads in data center. Josep LI, Berral [12] provided a method of using machine learning to adaptive schedule on power-aware managed data centers.

  2. ENERGY AWARE RESOURCE ALLOCATION OF DATACENTER USING

    MACHINE LEARNING TECHNIQUES

    The survey states that when the CPU utilization is at 70% and the disk utilization reaches 50% of a host, the datacenter has the best energy saving [15]. In addition the performance and the disk utilization in single host concern with inverse ratio. We are using reinforcement process of machine learning technique which is divided into two phases, selection of local optimal host and selection of Global optimal host.

    1. Selection of local optimal Host

      The selection of local optimal host process states that, the virtualization datacenter ( VDC) can be described by five factor group, {V,H,R,VR,ST}.V(Virtual machine), the set of virtual machine types can be expressed as V={vm1,vm2.}..H is the set of the host nodes in data center. R (Resource), the resources of virtual resource pool, includes two parts CR (CPU Resource) and DR (Disk Resource), which can be expressed as R={(C,D)|C CR D DR}. And VR, the relationship between the virtual machine and the resource, can be expressed as VR= {(vm,

      1. |vm V, r R}.

        First step is to determine the CPU and disk utilization of the host before creating virtual machine on them and use list to record the utilization. Next calculate the Euclidean distance using the CPU and Disk utilization. The method of calculation

        =(CPU-70%)2+(Disk-50%)2

        Euclidean distance is the square root of current CPU utilization minus 70% plus current disk utilization minus 50%. We use this Euclidean distance, calculated result to get the local optimal host for allocating the waiting Virtual machine to the host which has the least Euclidean distance. Means, that Euclidean distance is minimum then the host has nearest optimum utilization.

        Euclidean Distance Algorithm

        1. Get Current CPU utilization

        2. Get Current Disk Utilization

        3. for #host 1 to hosts

      4. = (CPU-70%)2+(Disk-50%)2

      1. Allocate VM to Host with least

      2. Update

    2. Selection of Global optimal host

    The Selection of Global optimal Host process states that, the state transfer function denoted by ST. The task (T) of data center have the characters of ordering in time, uncertainty and according to probability distribution, we call it as task flows (TF), which can be expressed as TF={Ti|i t} ,and Ti means the task(T) in the time (t) i. The task will execute in virtual machine, and virtual machine must be created in some host, and the host has its own resource utilization situation, so how to reach the optimal ratio such as CPU is 70% and Disk is 50% is we need to handle with. There is complex nonlinear mapping between the virtual machine and host, which we called A={<Vm,Hn>| Vm V Hn H }, And Action(A) is the set of all possible mappings in some state.

    Fig 3: Conceptual design

    The Figure 3 shows the conceptual design of Reinforcement process. VM1 & VM2 is allocated in the Host1 with the threshold where CPU utilization is less than 50% and Disk utilization is less than 70%, if VM3 also allocated in Host1 then the threshold will increase the utilization which leads to increase in energy consumption and hence the Q-learning process efficiently selects the Host machine for the VMs.

    According to the Q-Learning process the state will change after choosing one mapping action from the set A and executing it in the prior state, we can get some rewards in the new state. For energy saving, we need to reduce the additional cost, which is directly propositional to the energy consumption. Our algorithm is a way of reducing additional cost by reducing the amount of virtual machine migration because whose mapping of virtual machines to host is global optimal.

    Q-Learning Algorithm

    1. Initialize State:

    2. Host Amount hosts

    3. VM Amount vms

    4. AllocatedVMs allocated vms to 1

    5. MapBetweenVM &HostList

      List<Map<#vm01,#host>>

    6. End State:

    7. MapBetweenVMAndHost List<Map<#vm,#host>>

    8. AllocatedVMs allocated Vms is vms

    9. Input:

    10. List<VM>, List<Host>/*VM list and Host

      List*/

      learning to get the optimal mapping action at one state which considers the global optimal Q value.

  3. IMPLEMENTATION ENVIRONMENT

    a)]

    1. Output:

    2. Map<vm, host>/*mapping table of VM and

      Host*/

      /* process of learning*/

    3. Intialize Q(S, a) arbitrarily, a to the policy to be evaluated

    4. for #vm 2 to vms

    15. {

    1. Update s

      /* choose a from s using the policy derived from Q*/

    2. for #host 1 to hosts

    3. If (allocated vm to available #host)

    4. Return hosts consumed Power after Allocated

    5. Else

    6. Return hosts current consumed power

    7. a choose the minimum power consumed host that be allocated to the vm

    23.}

    1. Take action a

    2. Observe r, s

    26. Q(S, a) (1-) Q(S, a) +[r+V] maxb [Q (S,

    1. s s

    2. until s is Endstate

      /*Deduce the current optimal action according to the Q table */

    3. Choose a from s using policy derived from the table Q(S, a)

    4. Update map<#vm, #host>/*update the mapping table of VM and Host*/

    31.}

    1. Return map<#vm, #host>

      The CloudSim toolkit has been chosen as a simulation platform as it is a modern simulation framework aimed at Cloud computing environments. We have simulated on a single machine with a static number of hosts and VMs, it needs a CPU with the performance equivalent to 1860, 2660 MIPS, 4GB of RAM, 40GB of disk and 1040 resolution monitor. Each VM requires one CPU core with 500, 1000, 2000 and 2500 MIPS, 128MB of RAM and

      1GB of storage.

      A . Experiment Process

      We use Euclidean distance to judge, which has the more suitable host for the waiting virtual machine, then use Q table to estimate whether its the overall optimal.

      First determine the CPU and disk utilization of the host before creating virtual machine on them and use list to record the utilization. Next calculate the Euclidean distance using the CPU and Disk utilization

      Calculated Euclidean distance is divided into 7 states. The state transfer is taken place between these states. Finally, we use Q table to estimate the local optimal mapping scheme. After calculating the Q value, we can get the global optimal action at current state and we get the overall Q table for every state after a cycle. Then we use this Q table to choose the most energy saving mapping between waiting virtual machines and host.

  4. PERFORMANCE ANALYSIS

The performance of the proposed method is measured and presented by using a CloudSim tool.

    1. Comparison Analysis

      Local optimal mapping may get maximum rewards at one state, but in some state it wont be maximum in global optimal mapping, instead it will go against the performance of later states. So, we should consider the impact of the following mapping to current state and it is appropriate to calculate the data center rewards by using the following formula.

      Qk (S,a) (1 k)Qk-1 (S,a) + [rk + Vk-1]maxb[Qk-

      1(S,a)]

      is the discount factor, and is the learning factor. Qk-1 theQ value of the older state K-1, it will update to the new Q value after executing the action a. rk is the rewards at the execution of current state. Vk-1 is the discount Q value of next state. The Vk-1 and rk both are used to estimate the improvement of the state rewards. expresses the confidence level of improvement. Our algorithm uses Q

      For the experiment result we have used three different policies: Non power Aware Policy. This policy does not apply any power aware and implies that all host run at 100% CPU utilization and consumes maximum power all the time.CPU Disk Policy(CDP) consider CPU, Disk as the standard to choose the ideal physical machine. Q-Learning Policy(QP) obtains the overall optimal physical host in the current state according to the Q value.

    2. Graph Analysis

The Graph analysis provide the comparison of CDP, QP and NPAP in form of Graph

2

1.5

1

0.5

NPAP

QP

0

10

VM

20

30

NPAP

CDP

2.5

2

1.5

1

0

Energy

Energy

Fig 4: Comparison of QP & NPAP

Energy

Figure 4 shows the simulation result of the comparison of Q-learning policy and Non power Aware Policy with 20 hosts and 20 VMs. The average energy consumption of datacenter increases with the increase of the number of virtual machines to be mapped, mainly due to the increase in energy consumption caused by the increase of the virtual machine load.

2.5

2

1.5

1

NPAP

QP

0.5

0

20

30

VM

40

Fig 5: Comparison of QP & NPAP

Figure 5 shows the simulation result of the comparison of Q-learning policy with Non power Aware Policy from 20 hosts to 40 hosts and 20 VM to 40 VMs.

40

50

60

VM

0.5

0

Fig 6: Comparison of CDP & NPAP

Energy

Figure 6 shows the simulation result of the comparison of CPU Disk Policy and Non power Aware Policy from 40 hosts to 60 hosts and 40 VMs to 60 VMs.

3

2.5

2

1.5

1

NPAP

CDP

0.5

0

60

65

70

VM

Fig 7: Comparison of CDP & NPAP

Figure 7 shows the simulation result of the comparison of CPU Disk Policy and Non power Aware Policy from 60 hosts to 80 hosts and 60 VM to 80 VM.

3

2.5

2

1.5

1

NPAP

CDP QP

0.5

0

0

50

100

VM

Energy

Fig 8: Comparison of CDP, QP and NPAP

Figure 8 shows the simulation result of the comparison of CPU Disk Policy and Q-learning policy with Non power Aware Policy from 0 hosts to 80 hosts and 0 VM to 80 VM.

  1. Bo Li, Jianxin Li, Jinpeng Huai, Tianyu WO, etc. EnaCloud: Energy-Saving Application Live Placement Approach for CloudComputing Environments [A]. In: Cloud Computing (CLOUD '09)[C], 2009, pp.17-24.

  2. Shekhar Srikantaiah, Aman Kansal, FengZhao, Energy Aware Consolidation for Cloud Computing [EB/OL]. http://labs.chinamo-bile.com/report/view_16785.

  3. Xavier Grehant, Isabelle Demeure. Symmetric Mapping: An Architectural pattern for resource Supply in grids and clouds [A].In:Parallel &Distributed Processing (IPDPS)[C],2009, pp.1-8.

  4. Josep Ll. Beral, Ricard Gavalda, Jordi Torres. Adaptive Scheduling on Power-Aware Managed Data-Centers using Machine Learning [R]. Research Report number: UPC-LSI- 11-7-R, July 2011.

  5. Eduardo Pinheiro, Ricardo Bianchini, and Enrique V. Carrera, proposed Load Balancing and Unbalancing For Power and Performance in Cluster-Based Systems

  6. K. Li, Performance Analysis of Power-Aware Task Scheduling Algorithms on Multiprocessor Computers with Dynamic Voltage and Speed, IEEE Trans. Parallel Disturb. Syst., vol. 19, no. 11, pp. 14841497, 2008.

  7. Jingling Yuan, Energy Aware Resources Scheduling Algorithm

    For Datacenter using Reinforcement Learning, 2012, fifth International Conference.

  8. Kamalijit Kaur, Eco Efficient Approaches to cloud Computing: A review, IJARCSSE in March 2013.

  9. Rajkumar Buyya, Energy Aware resource allocation Heuristic for

Efficient management of datacenter for Cloud computing,

ELSEVIER volume 28 may 2012.

CONCLUSION

This paper from the view of energy saving, address the virtual machine and physical machine scheduling strategy based on reinforcement learning of machine learning techniques, which uses N:1 mapping virtualization technology and Q learning to explore the global energy saving mapping . The results were simulated on Clould Sim platform. The results show that energy reduction can be achieved up to 42% using our algorithm in compared to other non power aware algorithms. Hence utilization of resources increases, overall energy consumption of data centre is reduced.

REFERENCE

  1. Deng Li, Wang Song ,JIN Hai, Virtualization An Effective way to greening datacenter [j].

    Communication of China Computer Federation (CCCF), 2010, 6(3), pp.20-23.

  2. Report to congress on server and data center energy efficiency-public law 109-431 [R].

    USEPA07b. U.S. Environmental Protection Agency. July 2007.

  3. W .Forrest. How to cut data centre carbon emissions [EB/OL]. http://www.computerweekly.com/Articles/2008/12/05/233748/How- to-cut-data-centre-carbon-emissions.htm. December 2008.

  4. Chao Li, Amer Qouneh, Tao Li. Characterizing and Analyzing Renewable Energy Driven Data Centers[A].International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)[C], June 2011 (Short Paper).

  5. Chao Li, Wangyuan Zhang, Chang-Burm Cho, Tao Li. Solar Core: Solar energy driven multi-core architecture power management [A]. High Performance Computer Architecture (HPCA) [C], 2011, pp.205-216.

  6. Younge A.J., von Laszewski G., Lizhe Wang, etc. Efficient resource management for Cloud computing environments [A]. In: Green Computing Conference[C], 2010, pp.257-364.

  7. Chung-hsing Hsu, Wu-chun Feng. A Power-Aware Run-Time System for High-Performance Computing. Supercomputing [A], In: Proceedings of the ACM/IEEE SC 2005 Conference[C].2005, pp.1.

  8. Hai Zhong, Kun Tao, Xuejie Zhang. An Approach to Optimized Resource Scheduling Algorithm for Open-Source Cloud Systems [A]. In: China Grid Conference (ChinaGrid'10)[C]. 2010, pp.124- 129.

Leave a Reply