Approximate reinforcement learning methods have the advantages such as strong generalization and saving computation resources so that they are especially suitable for the problems with the continuous spaces. However, their low sample efficiency and convergence rate hinder the further application in practice. The approximate reinforcement learning methods can accelerate the convergence for the algorithm by using model learning and planning, consequently the sample efficiency and convergence rate can be improved heavily. Therefore, they have been spots in the field of reinforcement learning. .The problems existed in the methods based on model learning such as low planning efficiency, slow policy convergence and poor in-time performance are the main focuses of this project. In order to solve these problems, we propose a reinforcement learning method for continuous spaces based on multi-step model and new policy update rule, where the primary innovation points include: 1) To improve the planning efficiency, an approximate multi-step model is constructed and is then used for planning, in the meanwhile, the value function error formula generated from the planning of the approximate model planning is derived, and it is further analyzed so as to set the parameters and improve the stability; 2) The improved policy update rule is designed based on the advantageous function so that the policy can be converged rapidly; 3) The approximate reinforcement learning algorithm based on approximate multi-step model and the improved policy update rule is proposed where the convergence is also analyzed theoretically; 4) Combined with the proposed algorithm, the approximate reinforcement learning framework with parallel operation is constructed and then it is applied in the practical building energy saving problem.
近似强化学习方法具有泛化能力强和节省计算资源的优点,尤其适合连续空间的最优策略求解,但却存在样本低效和收敛速度慢的问题,因而制约了其在实时问题中的应用。基于模型学习的近似强化学习能通过模型学习与规划促进算法收敛,从而提高样本效率和收敛速度,是强化学习领域的研究热点之一。.本项目主要针对现有的基于模型学习的方法存在的规划效率低、策略收敛慢和实时性欠佳等问题,提出了一种基于近似多步模型和新策略更新规则的连续空间强化学习方法,主要创新点为:1)建立近似的多步模型并利用其规划来提高规划效率,同时推导由近似模型规划产生的值函数误差,通过分析误差公式来指导算法参数设置,从而提高算法稳定性;2)设计基于优势函数的新策略更新规则,实现策略快速收敛;3)构建基于近似多步模型和新策略更新规则的近似强化学习算法,并对算法收敛性进行理论分析;4)结合所提算法,构建近似强化学习并行框架,并应用于实际的建筑节能问题。
基于近似模型的强化学习方法能充分利用样本数据从而提高最优策略的求解速度,尤其适合连续空间的最优策略求解,但却存在模型精确度难以保障和模型规划难以获取最优解的问题。为了解决该问题,本项目提出了一系列基于单步和多步模型近似并利用模型规划来加快算法收敛的连续空间强化学习方法,主要创新点为:1)基于单个样本和样本的轨迹,来建立近似的多步模型,并利用单步模型和多步模型的共同规划来提高规划的效率,构建基于近似多步模型和策略更新规则的近似强化学习算法,并对算法收敛性进行理论分析;2)建立基于模型加速和经验回放的策略学习机制,并设计基于优势函数的策略更新规则,实现策略快速收敛;3)通过对状态空间和动作空间的分段,建立一种双层的分段模型,实现对连续状态和动作空间的更精确地刻画,构造更为精确的模型;4)为更好地捕获模型中出现的不确定性,建立了一种基于高斯函数的模型,并给出了模型中参数的求解方式,实现了模型的不确定性的刻画;5)为了进一步提高样本的利用率,在Dyna框架中,采用最小二乘算法来取代时间差分算法,实现值函数、策略以及模型的参数求解,并加入资格迹,以加快整个算法的求解速度;6)设计端到端的无人驾驶深度网络模型,结合历史决策数据和当前感知图片来建立从感知数据到决策行为的映射。7)结合所提算法,构建近似强化学习并行框架,将其应用于清洁机器人、无人驾驶、倒立摆和平衡杆等问题中,并应用于实际的建筑节能问题。
{{i.achievement_title}}
数据更新时间:2023-05-31
涡度相关技术及其在陆地生态系统通量研究中的应用
粗颗粒土的静止土压力系数非线性分析与计算方法
环境类邻避设施对北京市住宅价格影响研究--以大型垃圾处理设施为例
中国参与全球价值链的环境效应分析
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
连续状态空间模型未知下的在线强化学习方法
基于双量化近似空间的粗糙集模型相关研究
单酶催化多步连续反应中底物结合模式研究
高维图模型的结构空间及学习方法