Understanding the Impact of Data Distribution on Q-learning with
Function Approximation
- URL: http://arxiv.org/abs/2111.11758v1
- Date: Tue, 23 Nov 2021 10:13:00 GMT
- Title: Understanding the Impact of Data Distribution on Q-learning with
Function Approximation
- Authors: Pedro P. Santos, Francisco S. Melo, Alberto Sardinha, Diogo S.
Carvalho
- Abstract summary: We study the interplay between the data distribution and Q-learning-based algorithms with function approximation.
We provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm.
We experimentally assess the impact of the data distribution properties in the performance of an offline deep Q-network algorithm.
- Score: 3.666599339851663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we focus our attention on the study of the interplay between
the data distribution and Q-learning-based algorithms with function
approximation. We provide a theoretical and empirical analysis as to why
different properties of the data distribution can contribute to regulating
sources of algorithmic instability. First, we revisit theoretical bounds on the
performance of approximate dynamic programming algorithms. Second, we provide a
novel four-state MDP that highlights the impact of the data distribution in the
performance of a Q-learning algorithm with function approximation, both in
online and offline settings. Finally, we experimentally assess the impact of
the data distribution properties in the performance of an offline deep
Q-network algorithm. Our results show that: (i) the data distribution needs to
possess certain properties in order to robustly learn in an offline setting,
namely low distance to the distributions induced by optimal policies of the MDP
and high coverage over the state-action space; and (ii) high entropy data
distributions can contribute to mitigating sources of algorithmic instability.
Related papers
- Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.
We derive estimators demographic parity, equal opportunity, and conditional mutual information.
To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - Networks with Finite VC Dimension: Pro and Contra [1.0128808054306184]
It is shown that, whereas finite VC dimension is desirable for uniform convergence of empirical errors, it may not be desirable for approximation of functions drawn from a probability distribution modeling the likelihood that they occur in a given type of application.
Practical limitations of the universal approximation property, the trade-offs between the accuracy of approximation and consistency in learning from data, and the influence of depth of networks with ReLU units on their accuracy and consistency are discussed.
arXiv Detail & Related papers (2025-02-04T19:44:14Z) - Learning-Based TSP-Solvers Tend to Be Overly Greedy [8.79364699260219]
This study constructs a statistical measure called nearest-neighbor density to verify the properties of randomly generated instance of learning-based solvers.
We validate that the performance of the learning-based solvers degenerates much on such augmented data.
In short, we decipher the limitations of learning-based TSP solvers tending to be overly greedy, which may have profound implications for AI-empowered optimization solvers.
arXiv Detail & Related papers (2025-02-02T12:06:13Z) - Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Preference-Based Multi-Agent Reinforcement Learning (PbMARL)
We identify the Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for PbMARL.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Cross-feature Contrastive Loss for Decentralized Deep Learning on
Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data.
Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent.
Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Incremental Permutation Feature Importance (iPFI): Towards Online
Explanations on Data Streams [8.49072000414555]
We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode.
We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization of absent features similar to permutation feature importance (PFI)
arXiv Detail & Related papers (2022-09-05T12:34:27Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.