Sample Efficient Reinforcement Learning in Mixed Systems through
Augmented Samples and Its Applications to Queueing Networks
- URL: http://arxiv.org/abs/2305.16483v2
- Date: Wed, 8 Nov 2023 07:58:37 GMT
- Title: Sample Efficient Reinforcement Learning in Mixed Systems through
Augmented Samples and Its Applications to Queueing Networks
- Authors: Honghao Wei, Xin Liu, Weina Wang, Lei Ying
- Abstract summary: This paper considers a class of reinforcement learning problems involving systems with two types of states: and pseudo-stochastic.
We propose a sample efficient method that accelerates learning by generating augmented data samples.
- Score: 22.20726152012564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers a class of reinforcement learning problems, which
involve systems with two types of states: stochastic and pseudo-stochastic. In
such systems, stochastic states follow a stochastic transition kernel while the
transitions of pseudo-stochastic states are deterministic given the stochastic
states/transitions. We refer to such systems as mixed systems, which are widely
used in various applications, including manufacturing systems, communication
networks, and queueing networks. We propose a sample efficient RL method that
accelerates learning by generating augmented data samples. The proposed
algorithm is data-driven and learns the policy from data samples from both real
and augmented samples. This method significantly improves learning by reducing
the sample complexity such that the dataset only needs to have sufficient
coverage of the stochastic states. We analyze the sample complexity of the
proposed method under Fitted Q Iteration (FQI) and demonstrate that the
optimality gap decreases as
$\tilde{\mathcal{O}}(\sqrt{{1}/{n}}+\sqrt{{1}/{m}}),$ where $n$ is the number
of real samples and $m$ is the number of augmented samples per real sample. It
is important to note that without augmented samples, the optimality gap is
$\tilde{\mathcal{O}}(1)$ due to insufficient data coverage of the
pseudo-stochastic states. Our experimental results on multiple queueing network
applications confirm that the proposed method indeed significantly accelerates
learning in both deep Q-learning and deep policy gradient.
Related papers
- Iterated Denoising Energy Matching for Sampling from Boltzmann Densities [109.23137009609519]
Iterated Denoising Energy Matching (iDEM)
iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our matching objective.
We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5times$ faster.
arXiv Detail & Related papers (2024-02-09T01:11:23Z) - Sampling weights of deep neural networks [1.2370077627846041]
We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.
In a supervised learning context, no iterative optimization or gradient computations of internal network parameters are needed.
We prove that sampled networks are universal approximators.
arXiv Detail & Related papers (2023-06-29T10:13:36Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Near-optimal Offline and Streaming Algorithms for Learning Non-Linear
Dynamical Systems [45.17023170054112]
We consider the setting of vector valued non-linear dynamical systems $X_t+1 = phi(A* X_t) + eta_t$, where $eta_t$ is unbiased noise and $phi : mathbbR to mathbbR$ is a known link function that satisfies certain em expansivity property.
arXiv Detail & Related papers (2021-05-24T22:14:26Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Effective Proximal Methods for Non-convex Non-smooth Regularized
Learning [27.775096437736973]
We show that the independent sampling scheme tends to improve performance of the commonly-used uniform sampling scheme.
Our new analysis also derives a speed for the sampling than best one available so far.
arXiv Detail & Related papers (2020-09-14T16:41:32Z) - A Provably Efficient Sample Collection Strategy for Reinforcement
Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.
We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.