Related papers: Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals

Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals

URL: http://arxiv.org/abs/2012.08682v2
Date: Mon, 21 Dec 2020 20:03:47 GMT
Title: Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals
Authors: Eray Unsal Atay, Igor Kadota and Eytan Modiano
Abstract summary: We consider a single-hop wireless network with sources transmitting time-sensitive information over unreliable channels. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots.
Score: 24.897224064639406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots. To analyze the performance of the learning algorithm, we introduce the notion of AoI regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI regret captures the penalty incurred by having to learn the statistics of the channels over the $T$ time slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as $\epsilon$-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI regret scales as $\Theta(\log T)$; second, we develop a novel learning algorithm and show that it has $O(1)$ regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI regret.

Related papers

Age Optimal Sampling for Unreliable Channels under Unknown Channel Statistics [25.04993246830622]
We study a system in which a sensor forwards status updates to a receiver through an errorprone channel, while the receiver sends the transmission results back to the sensor via a reliable channel. To evaluate the timeliness of the status information at the receiver, we use the Age of Information metric. We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely.
arXiv Detail & Related papers (2024-12-24T03:06:22Z)
Meta-Learning from Learning Curves for Budget-Limited Algorithm Selection [11.409496019407067]
In a budget-limited scenario, it is crucial to carefully select an algorithm candidate and allocate a budget for training it. We propose a novel framework in which an agent must select in the process of learning the most promising algorithm without waiting until it is fully trained.
arXiv Detail & Related papers (2024-10-10T08:09:58Z)
An Adaptive Algorithm for Learning with Unknown Distribution Drift [6.599344783327055]
We develop and analyze a general technique for learning with an unknown distribution drift. Our technique does not require prior knowledge about the magnitude of the drift. We demonstrate the application of our technique in two fundamental learning scenarios: binary classification and linear regression.
arXiv Detail & Related papers (2023-05-03T16:37:32Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Revisiting Random Channel Pruning for Neural Network Compression [159.99002793644163]
Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks. In this paper, we try to determine the channel configuration of the pruned models by random search. We show that this simple strategy works quite well compared with other channel pruning methods.
arXiv Detail & Related papers (2022-05-11T17:59:04Z)
Efficiently Learning Any One Hidden Layer ReLU Network From Queries [27.428198343906352]
We give the first-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network. Ours is the first with fully-time guarantees of efficiency even for worst-case networks.
arXiv Detail & Related papers (2021-11-08T18:59:40Z)
The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function. We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Learning and Planning in Average-Reward Markov Decision Processes [15.586087060535398]
We introduce learning and planning algorithms for average-reward MDPs. All of our algorithms are based on using the temporal-difference error rather than the conventional error when updating the estimate of the average reward.
arXiv Detail & Related papers (2020-06-29T19:03:24Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
Quantum algorithms for hedging and the learning of Ising models [6.346764987071066]
A paradigmatic algorithm for online learning is the Hedge algorithm by Freund and Schapire. This work presents quantum algorithms for such online learning in an oracular setting.
arXiv Detail & Related papers (2020-02-14T12:48:53Z)
Channel Assignment in Uplink Wireless Communication using Machine Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems. Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints. Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.