Related papers: To the Max: Reinventing Reward in Reinforcement Learning

To the Max: Reinventing Reward in Reinforcement Learning

URL: http://arxiv.org/abs/2402.01361v2
Date: Mon, 29 Jul 2024 18:07:08 GMT
Title: To the Max: Reinventing Reward in Reinforcement Learning
Authors: Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt,
Abstract summary: In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. We introduce textitmax-reward RL, where an agent optimize the maximum rather than the cumulative reward. In experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics.
Score: 1.5498250598583487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce \textit{max-reward RL}, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Related papers

Residual Reward Models for Preference-based Reinforcement Learning [11.797520525358564]
Preference-based Reinforcement Learning (PbRL) provides a way to learn high-performance policies in environments where the reward signal is hard to specify.<n>PbRL can suffer from slow convergence speed since it requires training in a reward model.<n>We propose a method to effectively leverage prior knowledge with a Residual Reward Model (RRM)
arXiv Detail & Related papers (2025-07-01T09:43:57Z)
On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+($λ$,$λ$))-GA [7.924445204088514]
We propose the application of a reward shaping mechanism to facilitate enhanced exploration of the environment by the RL agent. Our work demonstrates the ability of RL in dynamically configuring the $(lambda,lambda)$-GA, but also confirms the advantages of reward shaping in the scalability of RL agents.
arXiv Detail & Related papers (2025-02-27T16:53:28Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences [44.91973620442546]
This paper introduces a novel way to encourage exploration called $f$-Policy Gradients. We show that $f$-PG has better performance compared to standard policy methods on a challenging gridworld.
arXiv Detail & Related papers (2023-10-10T17:07:05Z)
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z)
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning [26.067411894141863]
An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL) Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function.
arXiv Detail & Related papers (2023-04-18T12:36:09Z)
Extreme Q-Learning: MaxEnt RL without Entropy [88.97516083146371]
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains. We introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT) Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms.
arXiv Detail & Related papers (2023-01-05T23:14:38Z)
Designing Rewards for Fast Learning [18.032654606016447]
We look at how reward-design choices impact learning speed and seek to identify principles of good reward design that quickly induce target behavior. We propose a linear-programming based algorithm that efficiently finds a reward function that maximizes action gap and minimizes subjective discount.
arXiv Detail & Related papers (2022-05-30T19:48:52Z)
Maximum Entropy RL (Provably) Solves Some Robust RL Problems [94.80212602202518]
We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function. Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
arXiv Detail & Related papers (2021-03-10T18:45:48Z)
Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z)
Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance. We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations. Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.