Related papers: Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts

Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts

URL: http://arxiv.org/abs/2208.04822v1
Date: Tue, 9 Aug 2022 15:05:15 GMT
Title: Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts
Authors: Po-Hsiang Chiu and Manfred Huber
Abstract summary: This paper proposes a Bayesian-flavored generalized reinforcement learning framework. We first establish the notion of parametric action model to better cope with uncertainty and fluid action behaviors. We then introduce the notion of reinforcement field as a physics-inspired construct established through "polarized experience particles" maintained in the learning agent's working memory.
Score: 2.398608007786179
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning a control policy that involves time-varying and evolving system dynamics often poses a great challenge to mainstream reinforcement learning algorithms. In most standard methods, actions are often assumed to be a rigid, fixed set of choices that are sequentially applied to the state space in a predefined manner. Consequently, without resorting to substantial re-learning processes, the learned policy lacks the ability in adapting to variations in the action set and the action's "behavioral" outcomes. In addition, the standard action representation and the action-induced state transition mechanism inherently limit how reinforcement learning can be applied in complex, real-world applications primarily due to the intractability of the resulting large state space and the lack of facility to generalize the learned policy to the unknown part of the state space. This paper proposes a Bayesian-flavored generalized reinforcement learning framework by first establishing the notion of parametric action model to better cope with uncertainty and fluid action behaviors, followed by introducing the notion of reinforcement field as a physics-inspired construct established through "polarized experience particles" maintained in the learning agent's working memory. These particles effectively encode the dynamic learning experience that evolves over time in a self-organizing way. On top of the reinforcement field, we will further generalize the policy learning process to incorporate high-level decision concepts by considering the past memory as having an implicit graph structure, in which the past memory instances (or particles) are interconnected with similarity between decisions defined, and thereby, the "associative memory" principle can be applied to augment the learning agent's world model.

Related papers

Learning in Markov Decision Processes with Exogenous Dynamics [39.6376520918509]
We study a structured class of MDPs characterized by state components whose transitions are independent of the agent's actions.<n>We show that exploiting this structure yields significantly improved learning guarantees.<n>We empirically validate our approach across classical toy settings and real-world-inspired environments.
arXiv Detail & Related papers (2026-03-03T11:10:45Z)
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions [58.329946838699044]
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements.<n>Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills.<n>We introduce InterPrior, a framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning.
arXiv Detail & Related papers (2026-02-05T18:59:27Z)
Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science [1.4688849984602808]
We present Nemori, a novel self-organizing memory architecture inspired by human cognitive principles.<n>Nemori's core innovation is principled, top-down method for autonomously organizing the raw conversational stream into semantically coherent episodes.<n>Nemori significantly outperforms prior state-of-the-art systems, with its advantage being particularly pronounced in longer contexts.
arXiv Detail & Related papers (2025-08-05T11:41:13Z)
Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph. We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Towards Task-Prioritized Policy Composition [10.477909792349823]
We propose a novel task-prioritized composition framework for Reinforcement Learning. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.
arXiv Detail & Related papers (2022-09-20T08:08:04Z)
Inference of Affordances and Active Motor Control in Simulated Agents [0.5161531917413706]
We introduce an output-probabilistic, temporally predictive, modular artificial neural network architecture. We show that our architecture develops latent states that can be interpreted as affordance maps. In combination with active inference, we show that flexible, goal-directed behavior can be invoked.
arXiv Detail & Related papers (2022-02-23T14:13:04Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors. We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives. We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z)
Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning [8.342863878589332]
We propose a new approach for learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning. We show that our approach significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces.
arXiv Detail & Related papers (2020-10-09T09:09:31Z)
Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.