Understanding the Impact of Data Distribution on Q-learning with
Function Approximation
- URL: http://arxiv.org/abs/2111.11758v1
- Date: Tue, 23 Nov 2021 10:13:00 GMT
- Title: Understanding the Impact of Data Distribution on Q-learning with
Function Approximation
- Authors: Pedro P. Santos, Francisco S. Melo, Alberto Sardinha, Diogo S.
Carvalho
- Abstract summary: We study the interplay between the data distribution and Q-learning-based algorithms with function approximation.
We provide a novel four-state MDP that highlights the impact of the data distribution in the performance of a Q-learning algorithm.
We experimentally assess the impact of the data distribution properties in the performance of an offline deep Q-network algorithm.
- Score: 3.666599339851663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we focus our attention on the study of the interplay between
the data distribution and Q-learning-based algorithms with function
approximation. We provide a theoretical and empirical analysis as to why
different properties of the data distribution can contribute to regulating
sources of algorithmic instability. First, we revisit theoretical bounds on the
performance of approximate dynamic programming algorithms. Second, we provide a
novel four-state MDP that highlights the impact of the data distribution in the
performance of a Q-learning algorithm with function approximation, both in
online and offline settings. Finally, we experimentally assess the impact of
the data distribution properties in the performance of an offline deep
Q-network algorithm. Our results show that: (i) the data distribution needs to
possess certain properties in order to robustly learn in an offline setting,
namely low distance to the distributions induced by optimal policies of the MDP
and high coverage over the state-action space; and (ii) high entropy data
distributions can contribute to mitigating sources of algorithmic instability.
Related papers
- Structured Prediction in Online Learning [66.36004256710824]
We study a theoretical and algorithmic framework for structured prediction in the online learning setting.
We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting.
We consider a second algorithm designed especially for non-stationary data distributions, including adversarial data.
arXiv Detail & Related papers (2024-06-18T07:45:02Z) - An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models [20.314426291330278]
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.)
This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling.
We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution.
arXiv Detail & Related papers (2024-04-23T21:02:58Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Cross-feature Contrastive Loss for Decentralized Deep Learning on
Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data.
Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent.
Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Offline Reinforcement Learning with Additional Covering Distributions [0.0]
We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation.
We show that sample-efficient offline RL for general MDPs is possible with only a partial coverage dataset and weak realizable function classes.
arXiv Detail & Related papers (2023-05-22T03:31:03Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Incremental Permutation Feature Importance (iPFI): Towards Online
Explanations on Data Streams [8.49072000414555]
We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode.
We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization of absent features similar to permutation feature importance (PFI)
arXiv Detail & Related papers (2022-09-05T12:34:27Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.