On the Limited Representational Power of Value Functions and its Links
to Statistical (In)Efficiency
- URL: http://arxiv.org/abs/2403.07136v1
- Date: Mon, 11 Mar 2024 20:05:48 GMT
- Title: On the Limited Representational Power of Value Functions and its Links
to Statistical (In)Efficiency
- Authors: David Cheikhi, Daniel Russo
- Abstract summary: We show information about the transition dynamics may be impossible to represent in the space of value functions.
A deeper investigation points to the limitations of the representational power as the driver of the inefficiency.
- Score: 6.408072565019087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the trade-offs between model-based and model-free methods is a
central question in reinforcement learning. Value-based methods offer
substantial computational advantages and are sometimes just as statistically
efficient as model-based methods. However, focusing on the core problem of
policy evaluation, we show information about the transition dynamics may be
impossible to represent in the space of value functions. We explore this
through a series of case studies focused on structures that arises in many
important problems. In several, there is no information loss and value-based
methods are as statistically efficient as model based ones. In other
closely-related examples, information loss is severe and value-based methods
are severely outperformed. A deeper investigation points to the limitations of
the representational power as the driver of the inefficiency, as opposed to
failure in algorithm design.
Related papers
- Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method [31.268301764230525]
This work formalizes a metric to evaluate unlearning quality in generative models.
We use it to assess the trade-offs between unlearning quality and performance.
We further evaluate how example's memorization and difficulty affect unlearning under a classical gradient ascent-based approach.
arXiv Detail & Related papers (2024-11-07T03:02:09Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - LAVA: Data Valuation without Pre-Specified Learning Algorithms [20.578106028270607]
We introduce a new framework that can value training data in a way that is oblivious to the downstream learning algorithm.
We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.
We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions.
arXiv Detail & Related papers (2023-04-28T19:05:16Z) - A feature selection method based on Shapley values robust to concept
shift in regression [0.0]
In this paper, we introduce a direct relationship between Shapley values and prediction errors.
We show that our proposed algorithm significantly outperforms state-of-the-art feature selection methods in concept shift scenarios.
We also perform three analyses of standard situations to assess the algorithm's robustness in the absence of shifts.
arXiv Detail & Related papers (2023-04-28T11:34:59Z) - Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning.
We propose a novel algorithm that leverages noise stability to estimate data uncertainty.
Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.