QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment: Paper and Code

Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment

Dec 02, 2019
Lei Liu

Figure 1 for QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment

Figure 2 for QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment

Figure 3 for QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment

Figure 4 for QoS-Aware Machine Learning-based Multiple Resources Scheduling for Microservices in Cloud Environment

Share this with someone who'll enjoy it:

Microservices have been dominating in the modern cloud environment. To improve cost efficiency, multiple microservices are normally co-located on a server. Thus, the run-time resource scheduling becomes the pivot for QoS control. However, the scheduling exploration space enlarges rapidly with the increasing server resources - cores, cache, bandwidth, etc. - and the diversity of microservices. Consequently, the existing schedulers might not meet the rapid changes in service demands. Besides, we observe that there exist resource cliffs in the scheduling space. It not only impacts the exploration efficiency, making it difficult to converge to the optimal scheduling solution, but also results in severe QoS fluctuation. To overcome these problems, we propose a novel machine learning-based scheduling mechanism called OSML. It uses resources and runtime states as the input and employs two MLP models and a reinforcement learning model to perform scheduling space exploration. Thus, OSML can reach an optimal solution much faster than traditional approaches. More importantly, it can automatically detect the resource cliff and avoid them during exploration. To verify the effectiveness of OSML and obtain a well-generalized model, we collect a dataset containing over 2-billion samples from 11 typical microservices running on real servers over 9 months. Under the same QoS constraint, experimental results show that OSML outperforms the state-of-the-art work, and achieves around 5 times scheduling speed.

View paper on

Share this with someone who'll enjoy it: