An Online Learning Approach to Optimizing Time-Varying Costs of AoI
- URL: http://arxiv.org/abs/2105.13383v1
- Date: Thu, 27 May 2021 18:10:56 GMT
- Title: An Online Learning Approach to Optimizing Time-Varying Costs of AoI
- Authors: Vishrant Tripathi, Eytan Modiano
- Abstract summary: We consider systems that require timely monitoring of sources over a communication network.
For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight.
For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
- Score: 26.661352924641285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider systems that require timely monitoring of sources over a
communication network, where the cost of delayed information is unknown,
time-varying and possibly adversarial. For the single source monitoring
problem, we design algorithms that achieve sublinear regret compared to the
best fixed policy in hindsight. For the multiple source scheduling problem, we
design a new online learning algorithm called
Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to
the best fixed scheduling policy in hindsight, while remaining computationally
feasible. The algorithm and its regret analysis are novel and of independent
interest to the study of online restless multi-armed bandit problems. We
further design algorithms that achieve sublinear regret compared to the best
dynamic policy when the environment is slowly varying. Finally, we apply our
algorithms to a mobility tracking problem. We consider non-stationary and
adversarial mobility models and illustrate the performance benefit of using our
online learning algorithms compared to an oblivious scheduling policy.
Related papers
- Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints [0.6466206145151128]
This paper studies an online optimal resource reservation problem in communication networks with job transfers.
We propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints.
arXiv Detail & Related papers (2024-05-03T10:12:40Z) - Efficient Online Learning with Memory via Frank-Wolfe Optimization:
Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret.
We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Unsupervised Deep Learning for Optimizing Wireless Systems with
Instantaneous and Statistic Constraints [29.823814915538463]
We establish a unified framework of using unsupervised deep learning to solve both kinds of problems with both instantaneous and statistic constraints.
We show that unsupervised learning outperforms supervised learning in terms of violation probability and approximation accuracy of the optimal policy.
arXiv Detail & Related papers (2020-05-30T13:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.