Achieving Sample and Computational Efficient Reinforcement Learning by
Action Space Reduction via Grouping
- URL: http://arxiv.org/abs/2306.12981v1
- Date: Thu, 22 Jun 2023 15:40:10 GMT
- Title: Achieving Sample and Computational Efficient Reinforcement Learning by
Action Space Reduction via Grouping
- Authors: Yining Li, Peizhong Ju, Ness Shroff
- Abstract summary: Reinforcement learning often needs to deal with the exponential growth of states and actions in high-dimensional spaces.
We learn the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity.
- Score: 7.691755449724638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning often needs to deal with the exponential growth of
states and actions when exploring optimal control in high-dimensional spaces
(often known as the curse of dimensionality). In this work, we address this
issue by learning the inherent structure of action-wise similar MDP to
appropriately balance the performance degradation versus sample/computational
complexity. In particular, we partition the action spaces into multiple groups
based on the similarity in transition distribution and reward function, and
build a linear decomposition model to capture the difference between the
intra-group transition kernel and the intra-group rewards. Both our theoretical
analysis and experiments reveal a \emph{surprising and counter-intuitive
result}: while a more refined grouping strategy can reduce the approximation
error caused by treating actions in the same group as identical, it also leads
to increased estimation error when the size of samples or the computation
resources is limited. This finding highlights the grouping strategy as a new
degree of freedom that can be optimized to minimize the overall performance
loss. To address this issue, we formulate a general optimization problem for
determining the optimal grouping strategy, which strikes a balance between
performance loss and sample/computational complexity. We further propose a
computationally efficient method for selecting a nearly-optimal grouping
strategy, which maintains its computational complexity independent of the size
of the action space.
Related papers
- Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Harnessing Heterogeneity: Learning from Decomposed Feedback in Bayesian
Modeling [68.69431580852535]
We introduce a novel GP regression to incorporate the subgroup feedback.
Our modified regression has provably lower variance -- and thus a more accurate posterior -- compared to previous approaches.
We execute our algorithm on two disparate social problems.
arXiv Detail & Related papers (2021-07-07T03:57:22Z) - Feature Grouping and Sparse Principal Component Analysis [23.657672812296518]
Grouping and Sparse Principal Analysis (SPCA) is widely used in data processing dimension reduction.
FGSPCA allows loadings to belong to disjoint homogeneous groups, with sparsity as a special case.
arXiv Detail & Related papers (2021-06-25T15:08:39Z) - Partition-based formulations for mixed-integer optimization of trained
ReLU neural networks [66.88252321870085]
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks.
At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.
arXiv Detail & Related papers (2021-02-08T17:27:34Z) - Statistically Guided Divide-and-Conquer for Sparse Factorization of
Large Matrix [2.345015036605934]
We formulate the statistical problem as a sparse factor regression and tackle it with a divide-conquer approach.
In the first stage division, we consider both latent parallel approaches for simplifying the task into a set of co-parsesparserank estimation (CURE) problems.
In the second stage division, we innovate a stagewise learning technique, consisting of a sequence simple incremental paths, to efficiently trace out the whole solution of CURE.
arXiv Detail & Related papers (2020-03-17T19:12:21Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.