High energy consumption has become a challenging problem for supercompter systems. Target at GPU cluster, one typical supercomputer architecture, we propose the energy efficient task scheduling model and algorithm which can significantly cut the energy consumption in the long term. Based on the system model, task model and energy model, we propose the task scheduling model for GPU clusters. We provide a metric, OEAD (Optimal Energy Approciate Degree), which can be used to evaluate the results of different scheduling algorithms on different supercomputers easily. The waterfall model is provided to help us to develop different kinds of efficient energy saving policies which can be employed in our scheduling algorithm. Based on the current workload, history workload and the predicted workload of given time window, we develop the method on calculating the power budget which can be taken as the approximate optimal power to execute a long period of workload. So our scheduling object is to make the dynamic system power curve as flat as possible and as close to the budget power as possible. We propose hierarchical task mapping and scheduling method which can map the tasks onto physical computing units level by level and step by step. At the same time, we employ different energy saving policies to adjust the dynamic power as close to the power budget as possible. We also provide the method on evaluating our algorithm on simulation environments, prototype systems and some typical existing GPU clusters. This complete evaluation results will help us to further improve our algorithm to make it adapt to more systems and achieve better result. Our algorithm will significantly reduce the energy consumption without loosing the quality of service in long run.
高能耗已经成为超级计算机研制与应用中必须解决的挑战性问题。本研究针对GPU集群这种典型的超级计算机体系结构,旨在解决在GPU集群上实现高能效任务调度所面临的基础性核心问题,设计可以长期、大幅度降低超级计算机能耗的调度模型与算法。本研究分析并抽象典型GPU集群的系统模型、任务模型、能耗模型以及调度模型;提出一种可以度量与比较超级计算机能量效率水平的指标;设计出瀑布模型用于指导多层次、多粒度的节能策略开发,在充分考虑全局节能效果的基础上给出预算功率的设置原则与方法,据此提出了基于预算功率指导的层次化、高能效任务调度算法的设计方法;基于仿真环境、原型系统以及真实系统,分别设计了对本研究提出的调度算法与相关策略的有效性进行全面验证与进一步优化提高的方法。这项基础性的研究成果,一方面可以用于指导未来节能型超级计算机的研制,另一方面可以应用到已经存在的超级计算系统中,大幅度降低其能耗开销。
高能耗已经成为超级计算机研制与应用中必须解决的挑战性问题,而且也成为云中心首先考虑的核心问题。本研究针对GPU集群这种典型的超级计算机体系结构,解决在GPU集群上实现高能效任务调度所面临的基础性问题,设计可以长期、大幅度降低超级计算机能耗、优化执行性能的调度模型与算法。本研究分析并抽象典型GPU集群的系统模型、任务模型、能耗模型,据此提出相应的调度模型;结合GPU超级计算系统与云中心,设计多层次、多粒度的节能策略,在充分考虑全局节能效果与应用执行性能的基础上,提出基于预算功率指导的层次化、高能效任务调度算法的设计方法;基于仿真环境、原型系统以及大规模真实GPU集群系统,并结合引力波数据处理、双黑洞仿真、空间天气实时预报、网络仿真等科学与工程应用问题,对本研究提出的算法与相关策略的有效性进行了实现、验证与提高,取得了显著的应用成果。
{{i.achievement_title}}
数据更新时间:2023-05-31
监管的非对称性、盈余管理模式选择与证监会执法效率?
MSGD: A Novel Matrix Factorization Approach for Large-Scale Collaborative Filtering Recommender Systems on GPUs
面向云工作流安全的任务调度方法
动物响应亚磁场的生化和分子机制
瞬态波位移场计算方法在相控阵声场模拟中的实验验证
异构GPU集群混合粒度任务协同调度与动态均衡机制研究
超级计算系统预算约束的可靠性与能耗动态融合任务调度策略
数据部署与任务调度融合的节能优化模型及算法研究
基于GPU异构集群的FFT算法数学库研究