With the rapid development of control technology, the controlled variables become continuous and complicated. Reinforcement learning can effectively solve the optimal control problem, which makes it a hotspot in the field of machine learning and control. This project intends to make use of the latest research achievements of the theory of probably approximately correct, statistical learning technology, online learning technology and tensor theory, and we will do research in online reinforcement learning based on basic theory, key technologies, and typical examples for verification to meet the defects of reinforcement learning at present such as poor learning results, unclear exploration, low learning efficiency and so on. First, we study the algorithm framework that satisfies the theory of probably approximately correct to provide theoretical guidance to solve the control problem of continuous-state system. Secondly, based on least square, nuclear support tensor machine, and kd-tree, we construct online reinforcement learning technology for continuous and discrete action space, and study the adaptive learning of model parameters to realize model reasoning and generation. Finally, combined with the task of robot component flexible control in the Anhui province key laboratory of special overload robot, we verify the proposed key technology. Through the research above, exploring the integrating points for reinforcement learning and online learning is proved to have great significance to enrich and supplement the theory systems for reinforcement learning.
随着控制技术的迅速发展,被控变量变得连续复杂。强化学习能有效解决最优控制问题,成为当前机器学习和控制领域研究的热点。本项目以模型数据未知连续状态系统的最优控制问题为背景,针对目前强化学习方法存在的学习结果欠优、探索不明确、学习利用率低等缺点,拟利用概率近似正确理论、统计学习技术、在线技术、张量理论等方面的最新成果,从基础理论、关键技术、实例验证三个方面开展在线强化学习方法研究。首先,研究满足概率近似正确理论的算法框架,为形成连续状态系统控制问题的具体求解提供理论指导;其次,针对连续、离散动作空间,构建基于最小二乘、核支持张量机、kd树等技术的在线强化学习方法,并研究模型参数自适应学习,实现模型的推理与生成;最后,结合安徽省特种重载机器人实验室机器人部件柔性控制任务,验证提出的关键技术。通过该项目的研究,探索强化学习和在线学习的契合点,对强化学习理论体系的丰富与补充具有重要的意义。
连续状态系统的最优控制问题是强化学习领域研究的热点。针对目前强化学习方法存在的学习结果欠优、探索不明确、学习利用率低等缺点,本项目应用统计方法、在线学习、张量等技术,对连续状态-连续动作和连续状态-离散动作强化学习方法的分析、在线学习算法的设计与实现、基于“探索-利用”和稀疏学习的最小二乘策略迭代在线算法构建、基于kd树划分连续状态空间的离散动作细化、高估现象下的集成Q学习网络构建等问题进行了研究,探索了在线学习与强化学习相互作用的契合点,对强化学习技术体系的丰富与补充具有一定的意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
涡度相关技术及其在陆地生态系统通量研究中的应用
环境类邻避设施对北京市住宅价格影响研究--以大型垃圾处理设施为例
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于ESO的DGVSCMG双框架伺服系统不匹配 扰动抑制
基于近似多步模型的连续空间强化学习方法研究
小样本空间下雷达未知状态在线感知与增量式学习
面向网络弱标记图像的视觉对象模型在线学习方法
复杂环境下数目未知时变的多目标连续跟踪方法研究