CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous
Control
- URL: http://arxiv.org/abs/2211.15205v2
- Date: Thu, 18 May 2023 06:55:33 GMT
- Title: CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous
Control
- Authors: Xiang Zheng, Xingjun Ma, Cong Wang
- Abstract summary: Intrinsic motivation is a promising technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards.
There exist two technical challenges in implementing intrinsic motivation.
We propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective.
We empirically show, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods.
- Score: 25.786085434943338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intrinsic motivation is a promising exploration technique for solving
reinforcement learning tasks with sparse or absent extrinsic rewards. There
exist two technical challenges in implementing intrinsic motivation: 1) how to
design a proper intrinsic objective to facilitate efficient exploration; and 2)
how to combine the intrinsic objective with the extrinsic objective to help
find better solutions. In the current literature, the intrinsic objectives are
all designed in a task-agnostic manner and combined with the extrinsic
objective via simple addition (or used by itself for reward-free pre-training).
In this work, we show that these designs would fail in typical sparse-reward
continuous control tasks. To address the problem, we propose Constrained
Intrinsic Motivation (CIM) to leverage readily attainable task priors to
construct a constrained intrinsic objective, and at the same time, exploit the
Lagrangian method to adaptively balance the intrinsic and extrinsic objectives
via a simultaneous-maximization framework. We empirically show, on multiple
sparse-reward continuous control tasks, that our CIM approach achieves greatly
improved performance and sample efficiency over state-of-the-art methods.
Moreover, the key techniques of our CIM can also be plugged into existing
methods to boost their performances.
Related papers
- Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics [7.115267332079192]
We propose a two-stage reward curriculum where we decouple task-specific objectives from behavioral terms.<n>In our method, we first train the agent on a simplified task-only reward function to ensure effective exploration.<n>We validate our approach on the DeepMind Control Suite, ManiSkill3, and a mobile robot environment, modified to include auxiliary behavioral objectives.
arXiv Detail & Related papers (2026-03-05T12:34:27Z) - Enabling Option Learning in Sparse Rewards with Hindsight Experience Replay [4.687493080285017]
We propose MOC-HER, which integrates the Hindsight Experience Replay mechanism into the Option-Critic framework.<n>By relabeling goals from achieved outcomes, MOC-HER can solve sparse reward environments that are intractable for the original MOC.<n>We show that MOC-2HER achieves success rates of up to 90%, compared to less than 11% for both MOC and MOC-HER.
arXiv Detail & Related papers (2026-02-14T19:55:11Z) - Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z) - A Simple Approach to Constraint-Aware Imitation Learning with Application to Autonomous Racing [4.755527819500743]
We present a simple approach to incorporating safety into imitation learning (IL)
We empirically validate our approach on an autonomous racing task with both full-state and image feedback.
arXiv Detail & Related papers (2025-03-10T18:00:16Z) - Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability.
Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM.
Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z) - Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions [0.0]
Reinforcement learning has become an essential algorithm for generating complex robotic behaviors.
To learn such behaviors, it is necessary to design a reward function that describes the task.
In this paper, we propose the concept of Constraints as Rewards (CaR)
arXiv Detail & Related papers (2025-01-08T01:59:47Z) - Constrained Intrinsic Motivation for Reinforcement Learning [28.6289921495116]
Intrinsic Motivation (IM) is used for reinforcement learning in Reward-Free Pre-Training tasks and Exploration with Intrinsic Motivation (EIM) tasks.
Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks.
We propose emphConstrained Intrinsic Motivation (CIM) for RFPT and EIM tasks, respectively.
arXiv Detail & Related papers (2024-07-12T13:20:52Z) - Enhancing Robotic Navigation: An Evaluation of Single and
Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal.
By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Regularized Soft Actor-Critic for Behavior Transfer Learning [10.519534498340482]
Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior.
We propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task.
We evaluate our method on continuous control tasks relevant to video games applications.
arXiv Detail & Related papers (2022-09-27T07:52:04Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals.
We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.