Related papers: Data-Driven Evaluation of Training Action Space for Reinforcement Learning

Data-Driven Evaluation of Training Action Space for Reinforcement Learning

URL: http://arxiv.org/abs/2204.03840v1
Date: Fri, 8 Apr 2022 04:53:43 GMT
Title: Data-Driven Evaluation of Training Action Space for Reinforcement Learning
Authors: Rajat Ghosh, Debojyoti Dutta
Abstract summary: This paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation. The proposed data-driven methodology is RL to different domains, use cases, and reinforcement learning algorithms.
Score: 1.370633147306388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training action space selection for reinforcement learning (RL) is conflict-prone due to complex state-action relationships. To address this challenge, this paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation to avoid unnecessary explorations. The effectiveness of the methodology is illustrated using a cloud infrastructure resource tuning case study. It reduces the search space by 80\% and categorizes the training action sets into dispensable and indispensable groups. Additionally, it ranks different training actions to facilitate high-performance yet cost-efficient RL model design. The proposed data-driven methodology is extensible to different domains, use cases, and reinforcement learning algorithms.

Related papers

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies [66.83950068218033]
Scaling Laws demonstrate that scaling model parameters and training data enhances learning performance.<n>Despite its potential to improve performance, the integration of scaling laws into deep reinforcement learning has not been fully realized.<n>This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget.
arXiv Detail & Related papers (2025-08-05T08:03:12Z)
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining [12.630306478872043]
We propose textbfAdaLRS, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search.<n>Experiments show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness.
arXiv Detail & Related papers (2025-06-16T09:14:01Z)
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning [37.62558445850573]
We propose an algorithm, iterative influence-based filtering (IIF), for online RL training.<n>IIF reduces sample complexity, speeds up training, and achieves higher returns.<n>These results advance interpretability, efficiency, and effectiveness of online RL.
arXiv Detail & Related papers (2025-05-25T19:25:57Z)
PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning [13.564676246832544]
We introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy. PLANRL switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and fine-grained manipulation control when about to interact with objects. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods.
arXiv Detail & Related papers (2024-08-07T19:30:08Z)
Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning [10.117626902557927]
Current Reinforcement Learning (RL) methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF) to reduce the sample complexity. We show that our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.
arXiv Detail & Related papers (2024-03-18T19:51:17Z)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models. We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z)
A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Rethinking Population-assisted Off-policy Reinforcement Learning [7.837628433605179]
Off-policy reinforcement learning algorithms struggle with convergence to local optima due to limited exploration. Population-based algorithms offer a natural exploration strategy, but their black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer.
arXiv Detail & Related papers (2023-05-04T15:53:00Z)
Reinforcement Learning with Partial Parametric Model Knowledge [3.3598755777055374]
We adapt reinforcement learning methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment. Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control.
arXiv Detail & Related papers (2023-04-26T01:04:35Z)
Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions. We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories. Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.