Related papers: CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

URL: http://arxiv.org/abs/2211.15205v2
Date: Thu, 18 May 2023 06:55:33 GMT
Title: CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Authors: Xiang Zheng, Xingjun Ma, Cong Wang
Abstract summary: Intrinsic motivation is a promising technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation. We propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective. We empirically show, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods.
Score: 25.786085434943338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances.

Related papers

Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics [7.115267332079192]
We propose a two-stage reward curriculum where we decouple task-specific objectives from behavioral terms.<n>In our method, we first train the agent on a simplified task-only reward function to ensure effective exploration.<n>We validate our approach on the DeepMind Control Suite, ManiSkill3, and a mobile robot environment, modified to include auxiliary behavioral objectives.
arXiv Detail & Related papers (2026-03-05T12:34:27Z)
Enabling Option Learning in Sparse Rewards with Hindsight Experience Replay [4.687493080285017]
We propose MOC-HER, which integrates the Hindsight Experience Replay mechanism into the Option-Critic framework.<n>By relabeling goals from achieved outcomes, MOC-HER can solve sparse reward environments that are intractable for the original MOC.<n>We show that MOC-2HER achieves success rates of up to 90%, compared to less than 11% for both MOC and MOC-HER.
arXiv Detail & Related papers (2026-02-14T19:55:11Z)
Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z)
A Simple Approach to Constraint-Aware Imitation Learning with Application to Autonomous Racing [4.755527819500743]
We present a simple approach to incorporating safety into imitation learning (IL) We empirically validate our approach on an autonomous racing task with both full-state and image feedback.
arXiv Detail & Related papers (2025-03-10T18:00:16Z)
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond [52.486290612938895]
We propose a novel method that leverages the semantic knowledge from the Segment Anything Model (SAM) to Grow the quality of fusion results and Enable downstream task adaptability. Specifically, we design a Semantic Persistent Attention (SPA) Module that efficiently maintains source information via the persistent repository while extracting high-level semantic priors from SAM. Our method achieves a balance between high-quality visual results and downstream task adaptability while maintaining practical deployment efficiency.
arXiv Detail & Related papers (2025-03-03T06:16:31Z)
Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions [0.0]
Reinforcement learning has become an essential algorithm for generating complex robotic behaviors. To learn such behaviors, it is necessary to design a reward function that describes the task. In this paper, we propose the concept of Constraints as Rewards (CaR)
arXiv Detail & Related papers (2025-01-08T01:59:47Z)
Constrained Intrinsic Motivation for Reinforcement Learning [28.6289921495116]
Intrinsic Motivation (IM) is used for reinforcement learning in Reward-Free Pre-Training tasks and Exploration with Intrinsic Motivation (EIM) tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. We propose emphConstrained Intrinsic Motivation (CIM) for RFPT and EIM tasks, respectively.
arXiv Detail & Related papers (2024-07-12T13:20:52Z)
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal. By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z)
Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot. By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance. Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z)
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA) SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning. SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Regularized Soft Actor-Critic for Behavior Transfer Learning [10.519534498340482]
Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior. We propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task. We evaluate our method on continuous control tasks relevant to video games applications.
arXiv Detail & Related papers (2022-09-27T07:52:04Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Learning with AMIGo: Adversarially Motivated Intrinsic Goals [63.680207855344875]
AMIGo is a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks.
arXiv Detail & Related papers (2020-06-22T10:22:08Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.