Control Frequency Adaptation via Action Persistence in Batch
Reinforcement Learning
- URL: http://arxiv.org/abs/2002.06836v2
- Date: Sun, 12 Jul 2020 19:18:03 GMT
- Title: Control Frequency Adaptation via Action Persistence in Batch
Reinforcement Learning
- Authors: Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni,
Marcello Restelli
- Abstract summary: We introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps.
We present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence.
- Score: 40.94323379769606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of the control frequency of a system has a relevant impact on the
ability of reinforcement learning algorithms to learn a highly performing
policy. In this paper, we introduce the notion of action persistence that
consists in the repetition of an action for a fixed number of decision steps,
having the effect of modifying the control frequency. We start analyzing how
action persistence affects the performance of the optimal policy, and then we
present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends
FQI, with the goal of learning the optimal value function at a given
persistence. After having provided a theoretical study of PFQI and a heuristic
approach to identify the optimal persistence, we present an experimental
campaign on benchmark domains to show the advantages of action persistence and
proving the effectiveness of our persistence selection method.
Related papers
- Actor-Critic Reinforcement Learning with Phased Actor [10.577516871906816]
We propose a novel phased actor in actor-critic (PAAC) method to improve policy gradient estimation.
PAAC accounts for both $Q$ value and TD error in its actor update.
Results show that PAAC leads to significant performance improvement measured by total cost, learning variance, robustness, learning speed and success rate.
arXiv Detail & Related papers (2024-04-18T01:27:31Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - A Model-Based Approach for Improving Reinforcement Learning Efficiency
Leveraging Expert Observations [9.240917262195046]
We propose an algorithm that automatically adjusts the weights of each component in the augmented loss function.
Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks.
arXiv Detail & Related papers (2024-02-29T03:53:02Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Assessing the Impact of Context Inference Error and Partial
Observability on RL Methods for Just-In-Time Adaptive Interventions [12.762365585427377]
Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community.
JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components.
We study the effect of context inference error and partial observability on the ability to learn effective policies.
arXiv Detail & Related papers (2023-05-17T02:46:37Z) - Evolving Constrained Reinforcement Learning Policy [5.4444944707433525]
We propose a novel evolutionary constrained reinforcement learning algorithm, which adaptively balances the reward and constraint violation with ranking.
Experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2023-04-19T03:54:31Z) - Action Pick-up in Dynamic Action Space Reinforcement Learning [6.15205100319133]
We propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions.
In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience.
We then design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy.
arXiv Detail & Related papers (2023-04-03T10:55:16Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.