Based on fast model learning, this project proposes a method of Bayesian reinforcement learning with partially observable Markov decision processes. This method solves the problems that the environment is partially observable and the knowledge of the model is unknown. The main contents of study are as follows: i. In the discrete state space, we intend to propose a method of Bayesian dynamic programming, based on intelligent model learning. This method may solve the problems that the noise of partially observable models impacts the computation of value functions, such as the convergent speed and accuracy. ii. In partially observable models, it is difficult to predict the unknown states. This leads to the problems that we obtain a suboptimal policy, not the optimal one. To solve this problem, we intend to construct a Bayesian model of dynamic decision network based on discrete state space. iii. The calculation of optimal value functions rely on the model of environment, but the model is partially observable at the beginning. To solve this problem, we intend to present a method to optimize the model of the environment by cross entropy. iv. We intend to propose a method of adaptive Bayesian programming based on Gaussian processes. It can solve the problems of 'curse of dimensionality' and 'curse of history' in the continuous state space, with the partially observable models. v. For the problems with POMDPs, if we want to extend the discrete state space to the continuous one, there are a lot of problems, such as the computational complexity and performance of convergence. We intend to propose a method without discretization. vi. We intend to design a system to realize the aforementioned theory and optimized algorithms, and apply to the problems of robot navigation. Therefore, partially observable model-based Bayesian reinforcement study, has a certain theoretical value and a wide range of application prospects.
本项目在环境部分感知且环境模型未知的情况下,提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括:1. 针对模型部分感知对值函数计算带来的噪声干扰等问题,提出一种基于智能模型学习的贝叶斯动态规划方法。 2. 针对部分感知模型中未知状态难以预测,导致求解最优策略时出现扰动等问题,提出基于离散状态空间来构造动态决策网络的贝叶斯模型。3.针对计算最优值函数依赖环境模型等问题,提出通过交叉熵优化环境模型的方法。4. 针对在部分感知模型下,连续状态空间强化学习出现的“维数灾”和“经验灾”问题,提出基于高斯过程的自适应贝叶斯规划方法。5.针对离散状态的部分马氏问题扩展到连续状态空间时,出现的计算复杂等问题,提出一种在连续状态空间中采取非离散化解决问题的方法。6. 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究,既具有一定的理论价值,又具有广泛的应用前景。
本项目在环境部分感知且环境模型未知的情况下,提出基于快速模型学习的贝叶斯强化学习方法。主要内容包括:(1) 针对模型部分感知对值函数计算带来的噪声干扰等问题,提出一种基于智能模型学习的贝叶斯动态规划方法。 (2) 针对部分感知模型中难以预测未知状态,导致求解最优策略时出现扰动等问题,提出基于离散状态空间构造动态决策网络的贝叶斯模型。(3) 针对计算最优值函数依赖环境模型,提出通过交叉熵优化环境模型的方法。(4) 针对在部分感知模型下,连续状态空间强化学习出现的“维数灾”和“经验灾”问题,提出基于高斯过程的自适应贝叶斯规划方法。(5) 针对离散状态的部分马氏问题扩展到连续状态空间时,出现的计算复杂等问题,提出一种在连续状态空间中采取非离散化解决问题的方法。(6) 将理论应用于智能机器人导航等问题。因此基于部分感知模型的贝叶斯强化学习研究,既具有一定的理论价值,又具有广泛的应用前景。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
粗颗粒土的静止土压力系数非线性分析与计算方法
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
中国参与全球价值链的环境效应分析
转录组与代谢联合解析红花槭叶片中青素苷变化机制
基于贝叶斯压缩感知的电磁逆散射方法研究
基于贝叶斯推理的模糊逻辑强化学习模型研究
基于部分K空间数据子空间分解的贝叶斯非参数压缩感知MRI重建方法
基于分层先验知识和强化学习的稀疏贝叶斯宽带频谱感知