Related papers: Time Adaptive Reinforcement Learning

Time Adaptive Reinforcement Learning

URL: http://arxiv.org/abs/2004.08600v1
Date: Sat, 18 Apr 2020 11:52:07 GMT
Title: Time Adaptive Reinforcement Learning
Authors: Chris Reinke
Abstract summary: Reinforcement learning (RL) allows to solve complex tasks such as Go often with a stronger performance than humans. Here we consider the case of adapting RL agents to different time restrictions, such as finishing a task with a given time limit that might change from one task execution to the next. We introduce two model-free, value-based algorithms: the Independent Gamma-Ensemble and the n-Step Ensemble.
Score: 2.0305676256390934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) allows to solve complex tasks such as Go often with a stronger performance than humans. However, the learned behaviors are usually fixed to specific tasks and unable to adapt to different contexts. Here we consider the case of adapting RL agents to different time restrictions, such as finishing a task with a given time limit that might change from one task execution to the next. We define such problems as Time Adaptive Markov Decision Processes and introduce two model-free, value-based algorithms: the Independent Gamma-Ensemble and the n-Step Ensemble. In difference to classical approaches, they allow a zero-shot adaptation between different time restrictions. The proposed approaches represent general mechanisms to handle time adaptive tasks making them compatible with many existing RL methods, algorithms, and scenarios.

Related papers

PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z)
Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space. We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
Multi Agent Reinforcement Learning for Sequential Satellite Assignment Problems [5.896440476510869]
Assignment problems are a classic optimization problem in which a group of agents is assigned to a group of tasks. In many modern-day applications such as satellite, power grids, and mobile robot scheduling, assignment problems unfold over time. We apply multi-agent reinforcement learning to this problem, learning the value of assignments by bootstrapping from a known RL-time greedy solver. We demonstrate that our algorithm is theoretically justified and avoids pitfalls experienced by other algorithms in this setting.
arXiv Detail & Related papers (2024-12-20T05:10:34Z)
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL [37.58940726230092]
Reinforcement learning (RL) excels in optimizing policies for discrete-time Markov decision processes (MDP) We formalize an RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this challenge. We demonstrate that state-of-the-art RL algorithms trained on TaCoS drastically reduce the interaction amount over their discrete-time counterpart.
arXiv Detail & Related papers (2024-06-03T09:57:18Z)
FlowPG: Action-constrained Policy Gradient with Normalizing Flows [14.98383953401637]
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical resource-alential related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each step.
arXiv Detail & Related papers (2024-02-07T11:11:46Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Controllable Dynamic Multi-Task Architectures [92.74372912009127]
We propose a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights.
arXiv Detail & Related papers (2022-03-28T17:56:40Z)
An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents [1.3812010983144802]
We propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP) The aim is to build up a greedy-like able to learn on some distribution of JSSP instances, different in the number of jobs and machines. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
arXiv Detail & Related papers (2021-10-18T07:55:39Z)
Deep Reinforcement Learning with Adjustments [10.244120641608447]
We propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms. Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements.
arXiv Detail & Related papers (2021-09-28T03:35:09Z)
A Two-stage Framework and Reinforcement Learning-based Optimization Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z)
One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL [142.36621929739707]
We show that learning diverse behaviors for accomplishing a task can lead to behavior that generalizes to varying environments. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations.
arXiv Detail & Related papers (2020-10-27T17:41:57Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Hierarchical Reinforcement Learning as a Model of Human Task Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning. The model reproduces known empirical effects of task interleaving. The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.