To solve multi-step problem is one of the main research field of reinforcement learning. It has important and wide-range application in the field of robot navigation in unknown environments, computer game AI, control, and so on. As a genetics-based machine learning technique, learning classifier systems (LCSs) has shown promise on solving multi-step problems, but they have difficulties in solving large multi-step problems. This project tries to analyze the reasons behind the difficulties, and develop the solving mechanisms for LCSs in large multi-step problems. The concrete contents include: to study the performance limitations resulting from the discounted reward reinforcement learning algorithms within LCSs, and then replace them by some average reward reinforcement learning methods to support long action chains in large multi-step problems; to develop an effective memory mechanism for LCSs to cope with no-Markov problems, in order to improve the effectiveness and robustness of LCSs in these problems; to build some LCSs which can address multi-step problems with continuous state and action space, by using some typical function approximation methods and Generalized Classifier System based on LCSs' special structural features and generalization ability. The results of this study can provide theoretical and technical basis for the application of LCSs in related fields.
多步学习问题的求解是强化学习研究的主要问题之一,在未知环境下的机器人路径规划、计算机游戏智能、控制调度等领域有着重要和广泛的应用。学习分类元系统(Learning Classifier Systems, LCSs)对多步学习问题的求解展现出了应用价值,但其难于求解大规模的这类问题。为此,本项目通过研究大规模学习问题难于求解的主要原因,来构建LCSs在这类问题中的求解机制。具体内容包括:研究LCSs中现有的折扣奖赏强化学习算法对其性能的限制和阻碍作用,并通过将其置换为多种基于平均奖赏的强化学习算法,来提升LCSs对动作长链的支持能力;为LCSs构建有效的记忆机制来应对大规模学习问题具有的非马尔科夫特性;分别从典型的函数逼近方法和基于LCSs自身结构特点和泛化能力优势发展而来的广义分类元系统这两个方面,来求解具备连续状态和动作空间的多步学习问题。本项目的研究可为相关应用提供理论和技术基础。
未知环境下的机器人路径规划是一种多步学习问题。学习分类元系统(Learning Classifier Systems, LCSs)作为一种机器学习技术,常被用于求解这类问题,但其难于求解大规模的多步学习问题。为此,研究LCSs中现有的折扣奖赏强化学习算法对其性能的限制和阻碍作用,并通过将其置换为多种基于平均奖赏的强化学习算法,来提升LCSs对动作长链的支持能力;为LCSs设计有效的记忆机制来应对学习问题中的部分可观测特性;采用神经网络作为函数逼近器来辅助LCSs处理连续状态空间问题,以期构建LCSs在大规模学习问题中的求解机制。这为大规模的多步学习问题的求解提供了新的认识和解决思路,也为LCSs在诸如机器人导航、计算机游戏智能、控制调度等领域中的应用奠定了理论和技术基础。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于分形L系统的水稻根系建模方法研究
涡度相关技术及其在陆地生态系统通量研究中的应用
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
环境类邻避设施对北京市住宅价格影响研究--以大型垃圾处理设施为例
拥堵路网交通流均衡分配模型
面向大规模多目标组合优化问题的元启发式算法和元学习算法研究
面向大规模数据的多示例学习
面向动态实时人工智能应用的大规模机器学习系统协同调度技术研究
大规模数据的个性化分类学习