大数据分析引擎“系统配置”自动调优关键技术研究

基本信息

批准号：61802384

项目类别：青年科学基金项目

资助金额：27.00

负责人：贝振东

学科分类：

依托单位：中国科学院深圳先进技术研究院

批准年份：2018

结题年份：2021

起止时间：2019-01-01 - 2021-12-31

项目状态：已结题

项目参与者：刘琪骁,高希彤,曾经纬,周榕,赵宝新,林灵锋,朱亮

关键词：

全局搜索大数据分析引擎性能建模自动调优数据配置

结项摘要

Big data analysis engines such as Spark and Flink have been widely considered as important tools for efficient analysis of big data. However, in real scenarios, they often result in very inefficient or even failed operations due to improper system configuration. Our previous research found that there are four major challenges in the optimization of system configuration for big data analysis engines: a large number of configuration parameters with complex non-linear dependencies between the parameters; the processed data will affect the optimal system configuration; different characteristics of big data programs will affect the optimal system configuration; configuration optimization for different big data analysis engine require different modeling methods. To address these four challenges, we propose the key technology for automatic tuning of the "system configuration" of the big data engine. The main research contents include: large data program classification method based on feature learning; performance modeling techniques for high-dimensional configurations with small samples; accurate parameter importance and interactivity analysis with small samples; high-efficiency search algorithm of high-dimensional configuration parameters; unified configuration framework of big data analysis engine. The research results will greatly improve the performance of big data analysis engine and solve the problem of modeling for a complex system with high-dimensional configurations and fewer samples. It will open up new ideas and provide methodological guidance for performance analysis, modeling, and optimization for big data analysis engines.

大数据分析引擎如Spark和Flink等已经被广泛认为是大数据高效分析的重要工具。然而，现实场景中它们常常因为不合适的系统配置导致非常低效甚至运行失败。我们的前期研究发现大数据分析引擎系统配置的优化问题中存在四大挑战：配置参数数量多且参数之间存在复杂的非线性依赖；处理的数据会影响系统的最优配置；不同大数据程序的特征会影响系统的最优配置；不同的大数据分析引擎优化配置需要不同的建模方法。针对这四个挑战，提出大数据分析引擎“系统配置”自动调优关键技术研究。主要研究内容为：基于特征学习的大数据程序分类方法；针对高维配置小样本的性能建模技术；小样本条件下精确的参数重要性和可交互性分析；高维配置参数的高效搜索算法；统一的大数据分析引擎配置框架。课题研究成果将大幅提升大数据分析引擎性能，解决复杂系统建模面临的高维配置小样本的难题，为大数据分析引擎的性能分析、建模和优化开拓新思路和提供方法论指导

项目摘要

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：10.16368/j.issn.1674-8999.2018.12.569

发表时间：2018

DOI：10.19328/j.cnki.2096-8655.2022.02.002

发表时间：2022

DOI：10.13199/j.cnki.cst.2020.07.010

发表时间：2020

DOI：CNKI:SUN:YGXB.0.2018-01-012

发表时间：2018

DOI：10.11707/j.1001-7488.20210410

发表时间：2021

贝振东的其他基金

相似国自然基金

面向大数据分析系统的配置参数在线调优方法研究

批准号：61902440

批准年份：2019

负责人：窦晖

学科分类：F0202

资助金额：26.00

项目类别：青年科学基金项目

基于张量积的向量化代码自动生成和调优技术研究

批准号：61572025

批准年份：2015

负责人：刘仲

学科分类：F0204

资助金额：48.00

项目类别：面上项目

面向商务大数据的知识图谱引擎构建方法与关键技术研究

批准号：91846204

批准年份：2018

负责人：陈华钧

学科分类：F0607

资助金额：240.00

项目类别：重大研究计划

基于大数据分析框架的在线信用评级关键技术研究

批准号：71671155

批准年份：2016

负责人：刘耀强

学科分类：G0112

资助金额：47.90

项目类别：面上项目

大数据分析引擎“系统配置”自动调优关键技术研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

肥胖型少弱精子症的发病机制及中医调体防治

"多对多"模式下GEO卫星在轨加注任务规划

智能煤矿建设路线与工程实践

GF-4序列图像的云自动检测

基于PROSAIL模型和多角度遥感数据的森林叶面积指数反演

贝振东的其他基金

相似国自然基金