Delayed Rewards Calibration via Reward Empirical Sufficiency
- URL: http://arxiv.org/abs/2102.10527v2
- Date: Tue, 23 Feb 2021 03:19:53 GMT
- Title: Delayed Rewards Calibration via Reward Empirical Sufficiency
- Authors: Yixuan Liu, Hu Wang, Xiaowei Wang, Xiaoyue Sun, Liuyue Jiang and
Minhui Xue
- Abstract summary: We introduce a delay reward calibration paradigm inspired from a classification perspective.
We define an empirical sufficient distribution, where the state vectors within the distribution will lead agents to reward signals.
A purify-trained classifier is designed to obtain the distribution and generate the calibrated rewards.
- Score: 11.089718301262433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Appropriate credit assignment for delay rewards is a fundamental challenge
for reinforcement learning. To tackle this problem, we introduce a delay reward
calibration paradigm inspired from a classification perspective. We hypothesize
that well-represented state vectors share similarities with each other since
they contain the same or equivalent essential information. To this end, we
define an empirical sufficient distribution, where the state vectors within the
distribution will lead agents to environmental reward signals in the consequent
steps. Therefore, a purify-trained classifier is designed to obtain the
distribution and generate the calibrated rewards. We examine the correctness of
sufficient state extraction by tracking the real-time extraction and building
different reward functions in environments. The results demonstrate that the
classifier could generate timely and accurate calibrated rewards. Moreover, the
rewards are able to make the model training process more efficient. Finally, we
identify and discuss that the sufficient states extracted by our model resonate
with the observations of humans.
Related papers
- Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning [44.770495418026734]
Reinforcement Learning (RL) empowers agents to acquire various skills by learning from reward signals.
Traditional methods assume the existence of underlying Markovian rewards and that the observed delayed reward is simply the sum of instance-level rewards.
We propose Composite Delayed Reward Transformer (CoDeTr), which incorporates a specialized in-sequence attention mechanism.
arXiv Detail & Related papers (2024-10-26T13:12:27Z) - Reinforcement Learning from Bagged Reward [46.16904382582698]
In Reinforcement Learning (RL), it is commonly assumed that an immediate reward signal is generated for each action taken by the agent.
In many real-world scenarios, designing immediate reward signals is difficult.
We propose a novel reward redistribution method equipped with a bidirectional attention mechanism.
arXiv Detail & Related papers (2024-02-06T07:26:44Z) - Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Reward-Directed Conditional Diffusion: Provable Distribution Estimation
and Reward Improvement [42.45888600367566]
Directed generation aims to generate samples with desired properties as measured by a reward function.
We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels.
arXiv Detail & Related papers (2023-07-13T20:20:40Z) - Interpretable Reward Redistribution in Reinforcement Learning: A Causal
Approach [45.83200636718999]
A major challenge in reinforcement learning is to determine which state-action pairs are responsible for future rewards that are delayed.
We propose to explicitly model the contributions of state and action from a causal perspective, resulting in an interpretable reward redistribution.
Experimental results show that our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-28T21:51:38Z) - Reward Collapse in Aligning Large Language Models [64.98482888193267]
We study the phenomenon of textitreward collapse', an empirical observation where the prevailing ranking-based approach results in an textitidentical reward distribution.
Our experimental results suggest that our proposed prompt-aware utility functions significantly alleviate reward collapse during the training of reward models.
arXiv Detail & Related papers (2023-05-28T02:12:00Z) - Unpacking Reward Shaping: Understanding the Benefits of Reward
Engineering on Sample Complexity [114.88145406445483]
Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications.
In practice the choice of reward function can be crucial for good results.
arXiv Detail & Related papers (2022-10-18T04:21:25Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.