基于部分感知模型的贝叶斯强化学习理论及方法

基本信息

批准号：61772355

项目类别：面上项目

资助金额：65.00

负责人：刘全

学科分类：

依托单位：苏州大学

批准年份：2017

结题年份：2021

起止时间：2018-01-01 - 2021-12-31

项目状态：已结题

项目参与者：朱斐,傅启明,钟珊,王浩,钱伟晟,翟建伟,章鹏,徐进,梁斌

关键词：

模型学习部分感知模型值学习贝叶斯强化学习策略学习

结项摘要

Based on fast model learning, this project proposes a method of Bayesian reinforcement learning with partially observable Markov decision processes. This method solves the problems that the environment is partially observable and the knowledge of the model is unknown. The main contents of study are as follows: i. In the discrete state space, we intend to propose a method of Bayesian dynamic programming, based on intelligent model learning. This method may solve the problems that the noise of partially observable models impacts the computation of value functions, such as the convergent speed and accuracy. ii. In partially observable models, it is difficult to predict the unknown states. This leads to the problems that we obtain a suboptimal policy, not the optimal one. To solve this problem, we intend to construct a Bayesian model of dynamic decision network based on discrete state space. iii. The calculation of optimal value functions rely on the model of environment, but the model is partially observable at the beginning. To solve this problem, we intend to present a method to optimize the model of the environment by cross entropy. iv. We intend to propose a method of adaptive Bayesian programming based on Gaussian processes. It can solve the problems of 'curse of dimensionality' and 'curse of history' in the continuous state space, with the partially observable models. v. For the problems with POMDPs, if we want to extend the discrete state space to the continuous one, there are a lot of problems, such as the computational complexity and performance of convergence. We intend to propose a method without discretization. vi. We intend to design a system to realize the aforementioned theory and optimized algorithms, and apply to the problems of robot navigation. Therefore, partially observable model-based Bayesian reinforcement study, has a certain theoretical value and a wide range of application prospects.

本项目在环境部分感知且环境模型未知的情况下，提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括：1. 针对模型部分感知对值函数计算带来的噪声干扰等问题，提出一种基于智能模型学习的贝叶斯动态规划方法。 2. 针对部分感知模型中未知状态难以预测，导致求解最优策略时出现扰动等问题，提出基于离散状态空间来构造动态决策网络的贝叶斯模型。3.针对计算最优值函数依赖环境模型等问题，提出通过交叉熵优化环境模型的方法。4. 针对在部分感知模型下，连续状态空间强化学习出现的“维数灾”和“经验灾”问题，提出基于高斯过程的自适应贝叶斯规划方法。5.针对离散状态的部分马氏问题扩展到连续状态空间时，出现的计算复杂等问题，提出一种在连续状态空间中采取非离散化解决问题的方法。6. 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究，既具有一定的理论价值，又具有广泛的应用前景。

项目摘要

本项目在环境部分感知且环境模型未知的情况下，提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括：(1) 针对模型部分感知对值函数计算带来的噪声干扰等问题，提出一种基于智能模型学习的贝叶斯动态规划方法。 (2) 针对部分感知模型中难以预测未知状态，导致求解最优策略时出现扰动等问题，提出基于离散状态空间构造动态决策网络的贝叶斯模型。(3) 针对计算最优值函数依赖环境模型，提出通过交叉熵优化环境模型的方法。(4) 针对在部分感知模型下，连续状态空间强化学习出现的“维数灾”和“经验灾”问题，提出基于高斯过程的自适应贝叶斯规划方法。(5) 针对离散状态的部分马氏问题扩展到连续状态空间时，出现的计算复杂等问题，提出一种在连续状态空间中采取非离散化解决问题的方法。(6) 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究，既具有一定的理论价值，又具有广泛的应用前景。

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：

发表时间：

DOI：10.16285/j.rsm.2019.1280

发表时间：2019

DOI：10.11999/JEIT150995

发表时间：2016

DOI：10.12062/cpre.20181019

发表时间：2019

DOI：

发表时间：

刘全的其他基金

批准号：31372430

批准年份：2013

资助金额：85.00

项目类别：面上项目

批准号：31672542

批准年份：2016

资助金额：61.00

项目类别：面上项目

批准号：51379164

批准年份：2013

资助金额：80.00

项目类别：面上项目

批准号：10902078

批准年份：2009

资助金额：21.00

项目类别：青年科学基金项目

批准号：30972178

批准年份：2009

资助金额：30.00

项目类别：面上项目

批准号：60907017

批准年份：2009

资助金额：22.00

项目类别：青年科学基金项目

批准号：60873116

批准年份：2008

资助金额：35.00

项目类别：面上项目

批准号：41505129

批准年份：2015

资助金额：21.00

项目类别：青年科学基金项目

批准号：61272005

批准年份：2012

资助金额：61.00

项目类别：面上项目

批准号：61070223

批准年份：2010

资助金额：35.00

项目类别：面上项目

批准号：81600783

批准年份：2016

资助金额：18.00

项目类别：青年科学基金项目

批准号：61472262

批准年份：2014

资助金额：82.00

项目类别：面上项目

批准号：81670343

批准年份：2016

资助金额：62.00

项目类别：面上项目

相似国自然基金

基于贝叶斯压缩感知的电磁逆散射方法研究

批准号：61771008

批准年份：2017

负责人：张清河

学科分类：F0119

资助金额：50.00

项目类别：面上项目

基于贝叶斯推理的模糊逻辑强化学习模型研究

批准号：61272005

批准年份：2012

负责人：刘全

学科分类：F0201

资助金额：61.00

项目类别：面上项目

基于部分K空间数据子空间分解的贝叶斯非参数压缩感知MRI重建方法

批准号：61571382

批准年份：2015

负责人：丁兴号

学科分类：F0125

资助金额：57.00

项目类别：面上项目

基于分层先验知识和强化学习的稀疏贝叶斯宽带频谱感知

批准号：61703328

批准年份：2017

负责人：刘帅

学科分类：F0603

资助金额：22.00

项目类别：青年科学基金项目

基于部分感知模型的贝叶斯强化学习理论及方法

{{i.achievement_title}}

暂无此项成果

其他相关文献

玉米叶向值的全基因组关联分析

粗颗粒土的静止土压力系数非线性分析与计算方法

基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像

中国参与全球价值链的环境效应分析

转录组与代谢联合解析红花槭叶片中青素苷变化机制

刘全的其他基金

弓形虫新毒力因子MAPK的表达特性及功能分析

弓形虫丝裂原活化蛋白激酶调控虫体毒力的分子基础

水利工程再开发的多目标导流风险演化机理及调控理论

土石过水围堰溃堰机理及其模拟方法研究

piggyBac转座子介导的弓形虫速殖子-缓殖子转换分子机制研究

近红外波段堆垛结构三维光子晶体的制备和研究

基于tableau的非经典逻辑经典化的自动定理证明研究

机载质谱仪研究北京上空气溶胶化学组成的垂直分布特征

基于贝叶斯推理的模糊逻辑强化学习模型研究

面向tableau模型的逻辑强化学习理论及方法研究

LL37-DNA复合物调控CRSwNP组织中BAFF的表达及机制研究

基于模糊逻辑的大规模强化学习理论及方法

联合应用萝卜硫素和锌通过Nrf2和MT的协同机制预防糖尿病心肌病

相似国自然基金