Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for
Wireless Networks with Stochastic Arrivals
- URL: http://arxiv.org/abs/2012.08682v2
- Date: Mon, 21 Dec 2020 20:03:47 GMT
- Title: Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for
Wireless Networks with Stochastic Arrivals
- Authors: Eray Unsal Atay, Igor Kadota and Eytan Modiano
- Abstract summary: We consider a single-hop wireless network with sources transmitting time-sensitive information over unreliable channels.
At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel.
The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots.
- Score: 24.897224064639406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a single-hop wireless network with sources transmitting
time-sensitive information to the destination over multiple unreliable
channels. Packets from each source are generated according to a stochastic
process with known statistics and the state of each wireless channel (ON/OFF)
varies according to a stochastic process with unknown statistics. The
reliability of the wireless channels is to be learned through observation. At
every time slot, the learning algorithm selects a single pair (source, channel)
and the selected source attempts to transmit its packet via the selected
channel. The probability of a successful transmission to the destination
depends on the reliability of the selected channel. The goal of the learning
algorithm is to minimize the Age-of-Information (AoI) in the network over $T$
time slots. To analyze the performance of the learning algorithm, we introduce
the notion of AoI regret, which is the difference between the expected
cumulative AoI of the learning algorithm under consideration and the expected
cumulative AoI of a genie algorithm that knows the reliability of the channels
a priori. The AoI regret captures the penalty incurred by having to learn the
statistics of the channels over the $T$ time slots. The results are two-fold:
first, we consider learning algorithms that employ well-known solutions to the
stochastic multi-armed bandit problem (such as $\epsilon$-Greedy, Upper
Confidence Bound, and Thompson Sampling) and show that their AoI regret scales
as $\Theta(\log T)$; second, we develop a novel learning algorithm and show
that it has $O(1)$ regret. To the best of our knowledge, this is the first
learning algorithm with bounded AoI regret.
Related papers
- Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection [11.409496019407067]
In a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it.
We propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained.
arXiv Detail & Related papers (2024-10-10T08:09:58Z) - An Adaptive Algorithm for Learning with Unknown Distribution Drift [6.599344783327055]
We develop and analyze a general technique for learning with an unknown distribution drift.
Our technique does not require prior knowledge about the magnitude of the drift.
We demonstrate the application of our technique in two fundamental learning scenarios: binary classification and linear regression.
arXiv Detail & Related papers (2023-05-03T16:37:32Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Revisiting Random Channel Pruning for Neural Network Compression [159.99002793644163]
Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks.
In this paper, we try to determine the channel configuration of the pruned models by random search.
We show that this simple strategy works quite well compared with other channel pruning methods.
arXiv Detail & Related papers (2022-05-11T17:59:04Z) - Efficiently Learning Any One Hidden Layer ReLU Network From Queries [27.428198343906352]
We give the first-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network.
Ours is the first with fully-time guarantees of efficiency even for worst-case networks.
arXiv Detail & Related papers (2021-11-08T18:59:40Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Learning and Planning in Average-Reward Markov Decision Processes [15.586087060535398]
We introduce learning and planning algorithms for average-reward MDPs.
All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward.
arXiv Detail & Related papers (2020-06-29T19:03:24Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Quantum algorithms for hedging and the learning of Ising models [6.346764987071066]
A paradigmatic algorithm for online learning is the Hedge algorithm by Freund and Schapire.
This work presents quantum algorithms for such online learning in an oracular setting.
arXiv Detail & Related papers (2020-02-14T12:48:53Z) - Channel Assignment in Uplink Wireless Communication using Machine
Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems.
Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints.
Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.