Related papers: Offline Reinforcement Learning at Multiple Frequencies

Offline Reinforcement Learning at Multiple Frequencies

URL: http://arxiv.org/abs/2207.13082v1
Date: Tue, 26 Jul 2022 17:54:49 GMT
Title: Offline Reinforcement Learning at Multiple Frequencies
Authors: Kaylee Burns, Tianhe Yu, Chelsea Finn, Karol Hausman
Abstract summary: We study how well offline reinforcement learning algorithms can accommodate data with a mixture of frequencies during training. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning.
Score: 62.08749079914275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms na\"ive mixing by 50% on average.

Related papers

OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters [1.4131700241686853]
We develop an adaptive batch-scaling framework called OmniLearn to mitigate the effects of heterogeneous resources. Our approach is inspired by proportional controllers to balance across heterogeneous servers, and works under varying resource availability.
arXiv Detail & Related papers (2025-03-21T18:26:24Z)
Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations. Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z)
Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF) It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model. We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z)
Federated Empirical Risk Minimization via Second-Order Method [18.548661105227488]
We present an interior point method (IPM) to solve a general empirical risk minimization problem under the federated learning setting. We show that the communication complexity of each iteration of our IPM is $tildeO(d3/2)$, where $d$ is the dimension (i.e., number of features) of the dataset.
arXiv Detail & Related papers (2023-05-27T14:23:14Z)
Toward Understanding Generative Data Augmentation [16.204251285425478]
We show that generative data augmentation can enjoy a faster learning rate when the order of divergence term is $o(maxleft( log(m)beta_m, 1 / sqrtm)right)$. We prove that in both cases, though generative data augmentation does not enjoy a faster learning rate, it can improve the learning guarantees at a constant level when the train set is small.
arXiv Detail & Related papers (2023-05-27T13:46:08Z)
Deep Q-learning: a robust control approach [4.125187280299247]
We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
arXiv Detail & Related papers (2022-01-21T09:47:34Z)
An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.