Performance modeling for Hadoop Distributed File System (HDFS) has attracted significant attentions. Through a lot of experiments, we have investigated that HDFS performance has a unique probability distribution, which contains useful information. However, there is no research on probabilistic modeling for HDFS performance. Moreover, HDFS performance modeling is mainly based on a single modeling method, such as experimental modeling or analytical modeling, each of which has its own weakness. Model transferring between different platforms is still a challenge for HDFS performance modeling. Thus, this project aims to propose an “experimental modeling + analytical modeling + transfer learning”-based probabilistic modeling method for HDFS performance.. First, for file sizes in the range of (0, BS], experimental modeling methodology is used for HDFS performance modeling. Specially, a “Probability Density Estimation” method is proposed, which has two steps: (1) predicting characteristic index values and (2) restituting probability density function. Second, for file sizes in the range of (BS, +∞), analytical modeling methodology is used, and a “Probability Density Superposition”-based performance modeling method is proposed. Then, for HDFS performance modeling of a new platform, transfer learning methodology is used, and an “instance-based transfer learning” method is proposed to build approximate probabilistic models.. The proposed methods can build probabilistic models for HDFS performance of different platforms, and will reduce the cost of modeling and improve the efficiency of modeling. Moreover, the methods can also be useful for other data-intensive file systems.
HDFS性能建模是云计算领域研究热点之一。HDFS性能具有独特的概率分布特征,蕴含着有用的信息。但当前尚未开展HDFS性能概率建模研究;并且HDFS性能建模主要使用单一建模方法,性能模型迁移方法的研究处在起步阶段。为此,本项目研究一种“实验建模、分析建模与迁移学习结合”的HDFS读、写性能概率建模方法。.首先,基于HDFS工作机理,对文件大小域(0, BS]范围的文件大小(BS代表块长),采用实验建模,提出基于“特征指标值预测-概率密度还原”的性能概率密度函数估计方法;对文件大小域(BS, +∞)的文件大小,采用分析建模,提出基于概率密度叠加的性能建模方法。其次,对新平台的HDFS性能概率建模,提出基于实例迁移的HDFS性能概率建模方法。.以上方法能够建立不同平台HDFS读、写性能在文件大小域的概率模型,减少建模成本、提高建模效率,对其它数据密集型文件系统性能建模具有借鉴意义。
性能建模是云计算和大数据领域研究热点之一。以大数据分析为应用背景,研究数据密集型文件系统和计算的建模关键技术。主要研究内容包括:Hadoop分布式文件系统性能建模、基于系统辨识的Hadoop MapReduce性能建模、公有云环境下Spark性能建模和资源优化配置方法和Virtual Machine Consolidation性能建模与死锁避免策略。研究成果发表学术论文14篇,授权国家发明专利2项;成功部署于国税总局“金税三期”决策支持风险管理、纳税人信用管理、个人税收管理等跨省应用系统,实现了“总局+省局”多Hadoop数据中心。并作为主要研究成果获2017年度国家科学技术进步二等奖,项目负责人为第六完成人。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
特斯拉涡轮机运行性能研究综述
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
栓接U肋钢箱梁考虑对接偏差的疲劳性能及改进方法研究
基于电致离子输运的稀土配合物光写电读存储器的设计及性能研究
自旋轨道矩反转可电写电读的人工反铁磁体研究
炼胶过程的特征生成迁移建模与概率排胶研究
基于概率图模型的视角无关人体动作建模与识别方法研究