General purpose GPU computing technology has greatly promoted the development of the field of high performance computing. At the same time, it also poses some challenges to this domain. Currently, GPU has been widely used in bioinformatics, computational finance, machine learning, defense, medical imaging and other areas. GPU-based heterogeneous computing is one of the major development trends of the future high-performance computing. However, with the gap between the GPU computational power and peak bandwidth is growing, the GPU program's performance is mainly determined by the ability of moving data between memory hierarchies of GPU. In order to develop efficient GPU program, programmers should address several problems. First, for single GPU, the performance optimization space is very huge. It is very difficult to locate the performance bottleneck. Second, for GPU-based massively parallel computing, the program’s scalability is not well good due to low efficient communication method. This project aims to perform key techniques research on GPU performance model and communication optimization targeted at memory bound applications. Our purpose is to establish a GPU performance analysis model from the angle of data traffic for memory bound applications. Through model-driven approach, it can simplify the process of GPU program optimization. In addition, by using multi-stream and multi-threaded parallel mechanisms, combining with hybrid programming technologies, we are aiming to implement the overlap between computing and communication, which can improve the scalability of GPU-based massively computation.
GPU通用计算技术给高性能计算领域带来了前所未有的机遇和挑战。目前GPU已经在生物信息学、计算金融、机器学习、国防和医学成像等领域得到广泛的应用。基于GPU的异构计算是未来高性能计算发展的主流方向之一。然而,随着GPU的计算能力和峰值带宽之间的差距越来越大,程序的性能更多的由GPU各个存储层次之间的数据移动能力决定。然而,要开发高效的GPU程序还面临着诸多问题,其中以GPU程序的性能优化空间巨大、工作繁杂以及基于GPU的大规模并行计算可扩展性差两个方面最为突出。本课题提出面向存储受限应用的GPU性能分析预测模型和通信优化关键技术研究,从数据传输的角度建立GPU程序的性能预测模拟,通过模型驱动的方式简化GPU程序的优化,结合多流多线程并行机制、负载均衡、计算与通信重叠等技术隐藏节点之间通信开销,提高GPU之间的数据传输效率和大规模并行程序的可扩展性。
GPU通用计算技术给高性能计算领域带来了前所未有的机遇和挑战。目前GPU已经在生物信息学、计算金融、机器学习、国防和医学成像等领域得到广泛的应用。基于GPU的异构计算是未来高性能计算发展的主流方向之一。然而,随着GPU的计算能力和峰值带宽之间的差距越来越大,程序的性能更多的由GPU各个存储层次之间的数据移动能力决定。然而,要开发高效的GPU程序还面临着诸多问题,其中以GPU程序的性能优化空间巨大、工作繁杂以及基于GPU的大规模并行计算可扩展性差两个方面最为突出。本项目从数据访问的角度出发研究存储受限应用的GPU性能预测模型,提出了基于不同处理粒度的多层存储层次数据访问量模型,并以此为基础,提出了基于数据访问量的性能预测模型,在多个访存受限应用上进行了测试,预测精度达到95%;以单颗粒冷冻电镜为应用,研究了多GPU之间的数据传输优化问题,提出并实现了基于GPU的多级并型模型;以心电模拟为应用,提出了基于OpenMP+MPI+CUDA的大规模并行实现方法,程序的扩展性大大提高;研究了基于稀疏矩阵运算的图优化方法,提出了一个面向CPU-GPU架构的高效图计算框架,性能比当前最好的图框架高。
{{i.achievement_title}}
数据更新时间:2023-05-31
粗颗粒土的静止土压力系数非线性分析与计算方法
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
特斯拉涡轮机运行性能研究综述
基于LASSO-SVMR模型城市生活需水量的预测
低轨卫星通信信道分配策略
面向GPU的非规则应用并行效率优化关键技术研究
面向不规则GPU应用的分析与优化技术研究
面向大数据应用的分布式海量存储系统性能优化关键技术研究
基于GPU性能模型的异构系统优化技术研究