Constrained Intrinsic Motivation for Reinforcement Learning
- URL: http://arxiv.org/abs/2407.09247v1
- Date: Fri, 12 Jul 2024 13:20:52 GMT
- Title: Constrained Intrinsic Motivation for Reinforcement Learning
- Authors: Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang,
- Abstract summary: Intrinsic Motivation (IM) is used for reinforcement learning in Reward-Free Pre-Training tasks and Exploration with Intrinsic Motivation (EIM) tasks.
Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks.
We propose emphConstrained Intrinsic Motivation (CIM) for RFPT and EIM tasks, respectively.
- Score: 28.6289921495116
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.
Related papers
- Decoupling Task and Behavior: A Two-Stage Reward Curriculum in Reinforcement Learning for Robotics [7.115267332079192]
We propose a two-stage reward curriculum where we decouple task-specific objectives from behavioral terms.<n>In our method, we first train the agent on a simplified task-only reward function to ensure effective exploration.<n>We validate our approach on the DeepMind Control Suite, ManiSkill3, and a mobile robot environment, modified to include auxiliary behavioral objectives.
arXiv Detail & Related papers (2026-03-05T12:34:27Z) - Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions [74.35421055079655]
Large language models (LLMs) have enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities.<n>Mobile Edge General Intelligence (MEGI) brings real-time, privacy-preserving reasoning to the network edge.<n>We propose a joint optimization framework for efficient LLM reasoning deployment in MEGI.
arXiv Detail & Related papers (2025-09-27T10:53:48Z) - Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z) - Token-Level Uncertainty-Aware Objective for Language Model Post-Training [2.5671111123644894]
We connect token-level uncertainty in causal language modeling to two types of training objectives: 1) masked maximum likelihood (MLE), 2) self-distillation.
We show that masked MLE is effective in reducing epistemic uncertainty, and serve as an effective token-level automatic curriculum learning technique.
However, masked MLE is prone to overfitting and requires self-distillation regularization to improve or maintain performance on out-of-distribution tasks.
arXiv Detail & Related papers (2025-03-15T00:32:14Z) - A Simple Approach to Constraint-Aware Imitation Learning with Application to Autonomous Racing [3.324196481791132]
We present a simple approach to incorporating safety into imitation learning (IL)<n>We empirically validate our approach on an autonomous racing task with both full-state and image feedback.
arXiv Detail & Related papers (2025-03-10T18:00:16Z) - Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning [13.545981051703682]
Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation.
We identify several lines of work in different communities in AI, including LLM benchmarking, ToM add-ons, ToM probing, and formal models for ToM.
We conclude with suggestions for improved evaluation of ToM capabilities inspired by dynamic environments used in cognitive tasks.
arXiv Detail & Related papers (2024-12-18T09:06:48Z) - MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More [71.0473038084673]
We propose MC-MoE, a training-free Mixture-Compressor for Mixture-of-Experts large language models (MoE-LLMs)
MC-MoE leverages the significance of both experts and tokens to achieve an extreme compression.
For instance, at 2.54 bits, MC-MoE compresses 76.6% of the model, with only a 3.8% average accuracy loss.
arXiv Detail & Related papers (2024-10-08T18:09:38Z) - Neural Machine Unranking [3.2340528215722553]
We introduce a novel task termed Neural Machine UnRanking (NuMuR)<n>Existing task- or model- agnostic unlearning approaches are suboptimal for NuMuR due to two core challenges.<n>CoCoL comprises (1) a contrastive loss that reduces relevance scores on forget sets while maintaining performance on entangled samples, and (2) a consistent loss that preserves accuracy on retain set.
arXiv Detail & Related papers (2024-08-09T20:36:40Z) - Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective [125.00228936051657]
We introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features.
By fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks.
arXiv Detail & Related papers (2024-07-24T09:30:04Z) - SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts [49.01990048827639]
We introduce SEER-MoE, a framework for reducing both the memory footprint and compute requirements of pre-trained MoE models.
The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss.
Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.
arXiv Detail & Related papers (2024-04-07T22:13:43Z) - Many-Objective Evolutionary Influence Maximization: Balancing Spread, Budget, Fairness, and Time [3.195234044113248]
The Influence Maximization (IM) problem seeks to discover the set of nodes in a graph that can spread the information propagation at most.
This problem is known to be NP-hard, and it is usually studied by maximizing the influence (spread) and,Alternatively, optimizing a second objective.
In this work, we propose a first case study where several IM-specific objective functions, namely budget fairness, communities, and time, are optimized on top of influence and minimization of the seed set size.
arXiv Detail & Related papers (2024-03-27T16:54:45Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Learning Reward for Physical Skills using Large Language Model [5.795405764196473]
Large Language Models contain valuable task-related knowledge that can aid in learning reward functions.
We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
arXiv Detail & Related papers (2023-10-21T19:10:06Z) - CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous
Control [25.786085434943338]
Intrinsic motivation is a promising technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards.
There exist two technical challenges in implementing intrinsic motivation.
We propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective.
We empirically show, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2022-11-28T10:23:56Z) - Skill-Based Reinforcement Learning with Intrinsic Reward Matching [77.34726150561087]
We present Intrinsic Reward Matching (IRM), which unifies task-agnostic skill pretraining and task-aware finetuning.
IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods.
arXiv Detail & Related papers (2022-10-14T00:04:49Z) - Meta-learning with an Adaptive Task Scheduler [93.63502984214918]
Existing meta-learning algorithms randomly sample meta-training tasks with a uniform probability.
It is likely that tasks are detrimental with noise or imbalanced given a limited number of meta-training tasks.
We propose an adaptive task scheduler (ATS) for the meta-training process.
arXiv Detail & Related papers (2021-10-26T22:16:35Z) - Contingency-Aware Influence Maximization: A Reinforcement Learning
Approach [52.109536198330126]
influence (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence.
In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM.
Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms.
arXiv Detail & Related papers (2021-06-13T16:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.