Diverse Priors for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2310.14864v1
- Date: Mon, 23 Oct 2023 12:33:59 GMT
- Title: Diverse Priors for Deep Reinforcement Learning
- Authors: Chenfan Weng, Zhongguo Li
- Abstract summary: In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards in a given environment.
We introduce an innovative approach with delicately designed prior NNs, which can incorporate maximal diversity in the initial value functions of RL.
Our method has demonstrated superior performance compared with the random prior approaches in solving classic control problems and general exploration tasks.
- Score: 2.8554857235549753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards
in a given environment. During the learning process, RL agents face the dilemma
of exploitation and exploration: leveraging existing knowledge to acquire
rewards or seeking potentially higher ones. Using uncertainty as a guiding
principle provides an active and effective approach to solving this dilemma and
ensemble-based methods are one of the prominent avenues for quantifying
uncertainty. Nevertheless, conventional ensemble-based uncertainty estimation
lacks an explicit prior, deviating from Bayesian principles. Besides, this
method requires diversity among members to generate less biased uncertainty
estimation results. To address the above problems, previous research has
incorporated random functions as priors. Building upon these foundational
efforts, our work introduces an innovative approach with delicately designed
prior NNs, which can incorporate maximal diversity in the initial value
functions of RL. Our method has demonstrated superior performance compared with
the random prior approaches in solving classic control problems and general
exploration tasks, significantly improving sample efficiency.
Related papers
- Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z) - Value Function Initialization for Knowledge Transfer and Jump-start in Deep Reinforcement Learning [0.0]
We introduce DQInit, a method that adapts value function initialization to deep reinforcement learning.<n>DQInit reuses compact Q-values extracted from previously solved tasks as a transferable knowledge base.<n>It employs a knownness-based mechanism to softly integrate these transferred values into underexplored regions and gradually shift toward the agent's learned estimates.
arXiv Detail & Related papers (2025-08-12T18:32:08Z) - Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z) - Value Function Decomposition in Markov Recommendation Process [19.082512423102855]
We propose an online reinforcement learning framework to improve recommender performance.
We show that these two factors can be separately approximated by decomposing the original temporal difference loss.
The disentangled learning framework can achieve a more accurate estimation with faster learning and improved robustness against action exploration.
arXiv Detail & Related papers (2025-01-29T04:22:29Z) - Efficient Reinforcement Learning with Large Language Model Priors [18.72288751305885]
Large language models (LLMs) have recently emerged as powerful general-purpose tools.
We propose treating LLMs as prior action distributions and integrating them into RL frameworks.
We show that incorporating LLM-based action priors significantly reduces exploration and complexity optimization.
arXiv Detail & Related papers (2024-10-10T13:54:11Z) - A Comprehensive Survey on Evidential Deep Learning and Its Applications [64.83473301188138]
Evidential Deep Learning (EDL) provides reliable uncertainty estimation with minimal additional computation in a single forward pass.
We first delve into the theoretical foundation of EDL, the subjective logic theory, and discuss its distinctions from other uncertainty estimation frameworks.
We elaborate on its extensive applications across various machine learning paradigms and downstream tasks.
arXiv Detail & Related papers (2024-09-07T05:55:06Z) - Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness [3.0620294646308754]
We introduce an effective-Rank based Feature Richness enhancement (RFR) method, designed for improving forward compatibility.
Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting.
arXiv Detail & Related papers (2024-03-22T11:14:30Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep
Ensemble Agents [17.96977778655143]
reset method performs periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer.
We propose a new reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method.
arXiv Detail & Related papers (2023-10-31T08:59:39Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Soft Action Priors: Towards Robust Policy Transfer [9.860944032009847]
We use the action prior from the Reinforcement Learning as Inference framework to recover state-of-the-art policy distillation techniques.
Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses.
We show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors.
arXiv Detail & Related papers (2022-09-20T17:36:28Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.