Traditional business intelligence, new smart traffic and smart grid applications, on-line advertising and etc., impose tremendous requirements on real-time data analysis. Traditional database systems, massively parallel processing systems based on disk both fail to satisfy the new requirement on real-time data processing. However, the upgrades on the computer hardware and the evolution on the computer architecture provide new hardware bases for it, and the high reliable and high available cluster based on commodity servers provide new idea for it. This project mainly focus on the high performance data ingestion and analytics towards clutered in-memory computing, and try to solve the new system bottleneck, say communication, which is imposed by the shiftment in data storage medians, and make the real-time data analysis practical available. The major research topics are as followings. 1) Multi-level data partitioning over the clustered in-memory computing system and compound indexing strategies, for high performance data access; 2) Pipelined parallel query processing, runtime elastic parallelism adjustment, towards scalability and reduces communication bottleneck; 3)Transactional processing for file meta data, achieves non-blocked data ingestion for high through-put and low latency; 4)High availability of the building systems, such as fault-tolerant data set based on log-shipping, service-oriented transaction processing for fault-tolerance. The planned research conforms to the current applications and the development of the related technologies. It is of broad interests to the participants from academic and industries. The applicants have profound technical accumulation on the related areas, and have explored preliminarily on the proposed research plan, which ensure this project to be accomplished successfully.
传统商务智能、新兴智能交通与智能电网、互联网在线广告等众多应用对实时数据处理的需求空前增长。传统数据库系统、大规模磁盘并行处理系统已无法满足上述需求;而计算机硬件的升级、体系结构的演化为之提供了新硬件基础,由廉价服务器组建的高可靠、高可用集群为之提供了新设计理念。本项目旨在研究内存集群计算环境下的数据注入与分析技术,试图解决因数据存储媒介变化引发的通讯瓶颈问题,实现实时数据处理目标(实时注入、实时分析)。重点研究:1)内存集群下多层数据分区与复合索引访问机制,解决数据组织问题;2)基于流水线的并行处理机制,实现运行时并行度动态调整,弱化通讯瓶颈;3)事务性元数据管理,实现非阻塞数据注入,提升注入吞吐量及时效性;4)基于日志移动的数据容错、服务的事务容错,解决内存系统可靠性、可用性问题。本项目的研究符合现实应用需求和技术发展趋势,具有广阔的应用前景和学术价值。申请人积累充分,研究方案可行。
传统商务智能、新兴智能交通与智能电网、互联网在线广告等众多应用对实时数据处理的需求空前增长。传统数据库系统、大规模磁盘并行处理系统已无法满足上述需求;而计算机硬件的升级、体系结构的演化为之提供了新硬件基础,由廉价服务器组建的高可靠、高可用集群为之提供了新设计理念。本项目重点研究了:1)分布式数据流高动态下均衡负载管理;2)大规模网络中的数据密度估算;3)分布式数据流环境下的自适应流连接;4)生成查询感知的数据库数据等。本项目共发表论文9篇,其中CCF A类论文5篇,CCF B类论文2篇,CCF C类论文2篇,培养博士1人,博士生2人(2021年毕业),硕士2人,并开源了CLAIMS数据库系统。研究内容符合结项条件。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
涡度相关技术及其在陆地生态系统通量研究中的应用
论大数据环境对情报学发展的影响
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
集群环境下基于内存的高性能数据管理与分析
集群环境下基于内存的大数据分析技术研究
集群环境下内存空间数据库管理与查询技术研究
面向集群式内存的容错机制和数据组织策略研究