Related papers: Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

URL: http://arxiv.org/abs/2306.12981v1
Date: Thu, 22 Jun 2023 15:40:10 GMT
Title: Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping
Authors: Yining Li, Peizhong Ju, Ness Shroff
Abstract summary: Reinforcement learning often needs to deal with the exponential growth of states and actions in high-dimensional spaces. We learn the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity.
Score: 7.691755449724638
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a \emph{surprising and counter-intuitive result}: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.

Related papers

DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling [20.605487145370752]
Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation.<n>Yet, its practical application is often hindered by reliance on external verifiers or a lack of optimization for realistic computational constraints.<n>We propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework.
arXiv Detail & Related papers (2025-06-19T05:40:54Z)
Accelerating Spectral Clustering under Fairness Constraints [56.865810822418744]
We present a new efficient method for fair spectral clustering (Fair SC) by casting the Fair SC problem within the difference of convex functions (DC) framework.<n>We show that each associated subproblem can be solved efficiently, resulting in higher computational efficiency compared to prior work.
arXiv Detail & Related papers (2025-06-09T18:46:27Z)
Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization [15.898378661128334]
Reinforcement Learning (RL) algorithms are known to suffer from the curse of dimensionality. We propose overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs. We provide improved sample complexity guarantees for both proposed algorithms.
arXiv Detail & Related papers (2024-11-12T07:08:00Z)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms. We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths. We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z)
Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms. This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge. Existing algorithms face issues with slow convergence and/or robustness of performance. We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z)
Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian Modeling [68.69431580852535]
We introduce a novel GP regression to incorporate the subgroup feedback. Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches. We execute our algorithm on two disparate social problems.
arXiv Detail & Related papers (2021-07-07T03:57:22Z)
Feature Grouping and Sparse Principal Component Analysis [23.657672812296518]
Grouping and Sparse Principal Analysis (SPCA) is widely used in data processing dimension reduction. FGSPCA allows loadings to belong to disjoint homogeneous groups, with sparsity as a special case.
arXiv Detail & Related papers (2021-06-25T15:08:39Z)
Partition-based formulations for mixed-integer optimization of trained ReLU neural networks [66.88252321870085]
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.
arXiv Detail & Related papers (2021-02-08T17:27:34Z)
Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix [2.345015036605934]
We formulate the statistical problem as a sparse factor regression and tackle it with a divide-conquer approach. In the first stage division, we consider both latent parallel approaches for simplifying the task into a set of co-parsesparserank estimation (CURE) problems. In the second stage division, we innovate a stagewise learning technique, consisting of a sequence simple incremental paths, to efficiently trace out the whole solution of CURE.
arXiv Detail & Related papers (2020-03-17T19:12:21Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.