Reward-Aware Proto-Representations in Reinforcement Learning
- URL: http://arxiv.org/abs/2505.16217v1
- Date: Thu, 22 May 2025 04:33:00 GMT
- Title: Reward-Aware Proto-Representations in Reinforcement Learning
- Authors: Hon Tik Tse, Siddarth Chandrasekar, Marlos C. Machado,
- Abstract summary: In recent years, the successor representation (SR) has attracted increasing attention in reinforcement learning (RL)<n>In this paper, we discuss a similar representation that also takes into account the reward dynamics of the problem.<n>Our results show that, compared to the SR, the DR gives rise to qualitatively different, reward-aware behaviour and quantitatively better performance in several settings.
- Score: 6.855996110012974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, the successor representation (SR) has attracted increasing attention in reinforcement learning (RL), and it has been used to address some of its key challenges, such as exploration, credit assignment, and generalization. The SR can be seen as representing the underlying credit assignment structure of the environment by implicitly encoding its induced transition dynamics. However, the SR is reward-agnostic. In this paper, we discuss a similar representation that also takes into account the reward dynamics of the problem. We study the default representation (DR), a recently proposed representation with limited theoretical (and empirical) analysis. Here, we lay some of the theoretical foundation underlying the DR in the tabular case by (1) deriving dynamic programming and (2) temporal-difference methods to learn the DR, (3) characterizing the basis for the vector space of the DR, and (4) formally extending the DR to the function approximation case through default features. Empirically, we analyze the benefits of the DR in many of the settings in which the SR has been applied, including (1) reward shaping, (2) option discovery, (3) exploration, and (4) transfer learning. Our results show that, compared to the SR, the DR gives rise to qualitatively different, reward-aware behaviour and quantitatively better performance in several settings.
Related papers
- CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z) - Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks [6.881699020319577]
We propose Direct Reasoning Optimization (DRO), a reinforcement learning framework for fine-tuning Large Language Models (LLMs)<n>DRO is guided by a new reward signal: the Reasoning Reflection Reward (R3)<n>DRO consistently outperforms strong baselines while remaining broadly applicable across both open-ended and structured domains.
arXiv Detail & Related papers (2025-06-16T10:43:38Z) - RE-TRIP : Reflectivity Instance Augmented Triangle Descriptor for 3D Place Recognition [14.095215136905553]
We propose a novel descriptor for 3D Place Recognition, named RE-TRIP.<n>This new descriptor leverages both geometric measurements and reflectivity to enhance robustness.<n>We conduct a series of experiments to demonstrate the effectiveness of RE-TRIP.
arXiv Detail & Related papers (2025-05-22T03:11:30Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - Reward Models Identify Consistency, Not Causality [54.987590763737145]
State-of-the-art reward models prioritize structural consistency over causal correctness.<n>Removing the problem statement has minimal impact on reward scores.<n> altering numerical values or disrupting the reasoning flow significantly affects RM outputs.
arXiv Detail & Related papers (2025-02-20T14:57:14Z) - Out-of-Domain Generalization in Dynamical Systems Reconstruction [8.397468572544614]
We provide a formal framework that addresses generalization in DSR.
We show that black-box DL techniques, without adequate structural priors, generally will not be able to learn a generalizing DSR model.
arXiv Detail & Related papers (2024-02-28T14:52:58Z) - Explainable Session-based Recommendation via Path Reasoning [27.205463326317656]
We propose a hierarchical reinforcement learning framework for SR, which improves the explainability of existing SR models via Path Reasoning, namely PR4SR.
Considering the different importance of items to the session, we design the session-level agent to select the items in the session as the starting point for path reasoning and the path-level agent to perform path reasoning.
In particular, we design a multi-target reward mechanism to adapt to the skip behaviors of sequential patterns in SR, and introduce path midpoint reward to enhance the exploration efficiency in knowledge graphs.
arXiv Detail & Related papers (2024-02-28T12:11:08Z) - Single-Reset Divide & Conquer Imitation Learning [49.87201678501027]
Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms.
Some algorithms have been developed to learn from a single demonstration.
arXiv Detail & Related papers (2024-02-14T17:59:47Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.