Related papers: Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment

Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment

URL: http://arxiv.org/abs/2510.15456v1
Date: Fri, 17 Oct 2025 09:11:26 GMT
Title: Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment
Authors: Jan Corazza, Hadi Partovi Aria, Daniel Neider, Zhe Xu,
Abstract summary: Reinforcement learning algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment.<n>This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism.
Score: 6.914710674738284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. Furthermore, we provide a theoretical result about convergence to optimal policy for our method, and demonstrate its strengths empirically.

Related papers

Zero-Shot Instruction Following in RL via Structured LTL Representations [50.41415009303967]
We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training.<n>In this setting, linear temporal logic has recently been adopted as a powerful framework for specifying structured, temporally extended tasks.<n>While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal structure inherent in specifications.
arXiv Detail & Related papers (2026-02-15T23:22:50Z)
Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training [76.12556589212666]
We show that curriculum post-training avoids the exponential complexity bottleneck.<n>Under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with sample complexity.<n>We establish guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to order.
arXiv Detail & Related papers (2025-11-10T18:29:54Z)
Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning [51.54559117314768]
Recent work investigated the use of Reinforcement Learning (RL) for the synthesis of guidance to improve the performance of temporal planners.<n>We propose an evolution of this learning and planning framework that focuses on exploiting the information provided by symbolics during both the RL and planning phases.
arXiv Detail & Related papers (2025-05-19T17:19:13Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy [38.86867078596718]
We consider explicitly modeling the generation process of states with the graphical causal model.<n>We formulate the causal structure updating into the RL interaction process with active intervention learning of the environment.
arXiv Detail & Related papers (2024-02-07T14:09:34Z)
Reinforcement Learning with Temporal-Logic-Based Causal Diagrams [25.538860320318943]
We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals. While these machines model the reward function, they often overlook the causal knowledge about the environment. We propose the Temporal-Logic-based Causal Diagram (TL-CD) in RL, which captures the temporal causal relationships between different properties of the environment.
arXiv Detail & Related papers (2023-06-23T18:42:27Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Off-Policy Reinforcement Learning with Delayed Rewards [16.914712720033524]
In many real-world tasks, instant rewards are not readily accessible or defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. We introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees.
arXiv Detail & Related papers (2021-06-22T15:19:48Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments. We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.