Related papers: Fitted Q-Learning for Relational Domains

Related papers

Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions. This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
Trainability maximization using estimation of distribution algorithms assisted by surrogate modelling for quantum architecture search [8.226785409557598]
Quantum architecture search (QAS) involves optimizing both the quantum parametric circuit configuration but also its parameters for a variational quantum algorithm. In this paper, we aim to achieve two improvements in QAS: (1) to reduce the number of measurements by an online surrogate model of the evaluation process that aggressively discards architectures of poor performance; (2) to avoid training the circuits when BPs are present. We experimentally validate our proposal for the variational quantum eigensolver and showcase that our algorithm is able to find solutions that have been previously proposed in the literature for the Hamiltonians; but also to outperform the state of the
arXiv Detail & Related papers (2024-07-29T15:22:39Z)
Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense. The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z)
Optimising Rolling Stock Planning including Maintenance with Constraint Programming and Quantum Annealing [0.0]
We propose and compare Constraint Programming (CP) and Quantum Annealing (QA) approaches for rolling stock assignment optimisation. In the CP approach, we model the problem with an Alldifferent constraint, extensions of the Element constraint, and logical implications. For the QA approach, we develop a quadratic unconstrained binary optimisation (QUBO) model.
arXiv Detail & Related papers (2021-09-15T11:00:53Z)
A Value-Function-based Interior-point Method for Non-convex Bi-level Optimization [38.75417864443519]
Bi-level optimization model is able to capture a wide range of complex learning tasks with practical interest. We propose a new interior Bi-level Value-based Interior-point scheme, we penalize the regularized value function of the lower level problem into the upper level objective.
arXiv Detail & Related papers (2021-06-15T09:10:40Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.