Using Experience Classification for Training Non-Markovian Tasks
- URL: http://arxiv.org/abs/2310.11678v1
- Date: Wed, 18 Oct 2023 03:00:59 GMT
- Title: Using Experience Classification for Training Non-Markovian Tasks
- Authors: Ruixuan Miao, Xu Lu, Cong Tian, Bin Yu, Zhenhua Duan
- Abstract summary: Non-Markovian tasks are frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis.
We propose a novel RL approach to achieve non-Markovian rewards expressed in temporal logic$_f$.
- Score: 11.267797018727402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unlike the standard Reinforcement Learning (RL) model, many real-world tasks
are non-Markovian, whose rewards are predicated on state history rather than
solely on the current state. Solving a non-Markovian task, frequently applied
in practical applications such as autonomous driving, financial trading, and
medical diagnosis, can be quite challenging. We propose a novel RL approach to
achieve non-Markovian rewards expressed in temporal logic LTL$_f$ (Linear
Temporal Logic over Finite Traces). To this end, an encoding of linear
complexity from LTL$_f$ into MDPs (Markov Decision Processes) is introduced to
take advantage of advanced RL algorithms. Then, a prioritized experience replay
technique based on the automata structure (semantics equivalent to LTL$_f$
specification) is utilized to improve the training process. We empirically
evaluate several benchmark problems augmented with non-Markovian tasks to
demonstrate the feasibility and effectiveness of our approach.
Related papers
- Rational Metareasoning for Large Language Models [5.5539136805232205]
Being prompted to engage in reasoning has emerged as a core technique for using large language models (LLMs)
This work introduces a novel approach based on computational models of metareasoning used in cognitive science.
We develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning.
arXiv Detail & Related papers (2024-10-07T23:48:52Z) - LMGT: Optimizing Exploration-Exploitation Balance in Reinforcement Learning through Language Model Guided Trade-offs [27.014415210732103]
We introduce textbfLanguage textbfModel textbfGuided textbfTrade-offs (i.e., textbfLMGT), a novel, sample-efficient framework for Reinforcement Learning.
arXiv Detail & Related papers (2024-09-07T07:40:43Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Reinforcement Learning in High-frequency Market Making [7.740207107300432]
This paper establishes a new and comprehensive theoretical analysis for the application of reinforcement learning (RL) in high-frequency market making.
We bridge the modern RL theory and the continuous-time statistical models in high-frequency financial economics.
arXiv Detail & Related papers (2024-07-14T22:07:48Z) - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.
AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Improving Representational Continuity via Continued Pretraining [76.29171039601948]
Transfer learning community (LP-FT) outperforms naive training and other continual learning methods.
LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW)
variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.
arXiv Detail & Related papers (2023-02-26T10:39:38Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Inverse Reinforcement Learning of Autonomous Behaviors Encoded as
Weighted Finite Automata [18.972270182221262]
This paper presents a method for learning logical task specifications and cost functions from demonstrations.
We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown logic structure of the task.
We define a product between the WFA for high-level task guidance and a Labeled Markov decision process (L-MDP) for low-level control and optimize a cost function that matches the demonstrator's behavior.
arXiv Detail & Related papers (2021-03-10T06:42:10Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.