Reinforcement Learning in Low-Rank MDPs with Density Features
- URL: http://arxiv.org/abs/2302.02252v1
- Date: Sat, 4 Feb 2023 22:46:28 GMT
- Title: Reinforcement Learning in Low-Rank MDPs with Density Features
- Authors: Audrey Huang, Jinglin Chen, Nan Jiang
- Abstract summary: MDPs with low-rank transitions are highly representative structures that enable tractable learning.
We investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions.
- Score: 12.932032416729774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: MDPs with low-rank transitions -- that is, the transition matrix can be
factored into the product of two matrices, left and right -- is a highly
representative structure that enables tractable learning. The left matrix
enables expressive function approximation for value-based learning and has been
studied extensively. In this work, we instead investigate sample-efficient
learning with density features, i.e., the right matrix, which induce powerful
models for state-occupancy distributions. This setting not only sheds light on
leveraging unsupervised learning in RL, but also enables plug-in solutions for
convex RL. In the offline setting, we propose an algorithm for off-policy
estimation of occupancies that can handle non-exploratory data. Using this as a
subroutine, we further devise an online algorithm that constructs exploratory
data distributions in a level-by-level manner. As a central technical
challenge, the additive error of occupancy estimation is incompatible with the
multiplicative definition of data coverage. In the absence of strong
assumptions like reachability, this incompatibility easily leads to exponential
error blow-up, which we overcome via novel technical tools. Our results also
readily extend to the representation learning setting, when the density
features are unknown and must be learned from an exponentially large candidate
set.
Related papers
- Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs)
We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies.
We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
Luck [35.6883212537938]
We consider offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron.
We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting.
We also show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.
arXiv Detail & Related papers (2023-09-07T15:52:48Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Learning Centric Power Allocation for Edge Intelligence [84.16832516799289]
Edge intelligence has been proposed, which collects distributed data and performs machine learning at the edge.
This paper proposes a learning centric power allocation (LCPA) method, which allocates radio resources based on an empirical classification error model.
Experimental results show that the proposed LCPA algorithm significantly outperforms other power allocation algorithms.
arXiv Detail & Related papers (2020-07-21T07:02:07Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs [53.710405006523274]
This work focuses on the representation learning question: how can we learn such features?
Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem.
We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
arXiv Detail & Related papers (2020-06-18T19:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.