Related papers: An Online Learning Approach to Optimizing Time-Varying Costs of AoI

An Online Learning Approach to Optimizing Time-Varying Costs of AoI

URL: http://arxiv.org/abs/2105.13383v1
Date: Thu, 27 May 2021 18:10:56 GMT
Title: An Online Learning Approach to Optimizing Time-Varying Costs of AoI
Authors: Vishrant Tripathi, Eytan Modiano
Abstract summary: We consider systems that require timely monitoring of sources over a communication network. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
Score: 26.661352924641285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider systems that require timely monitoring of sources over a communication network, where the cost of delayed information is unknown, time-varying and possibly adversarial. For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight. For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader and show that it has low regret compared to the best fixed scheduling policy in hindsight, while remaining computationally feasible. The algorithm and its regret analysis are novel and of independent interest to the study of online restless multi-armed bandit problems. We further design algorithms that achieve sublinear regret compared to the best dynamic policy when the environment is slowly varying. Finally, we apply our algorithms to a mobility tracking problem. We consider non-stationary and adversarial mobility models and illustrate the performance benefit of using our online learning algorithms compared to an oblivious scheduling policy.

Related papers

Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning [8.559995591255811]
Recent advancements in reinforcement learning have leveraged neural networks to achieve state-of-the-art performance across various control tasks.<n>These successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data.<n>We introduce an actor-critic algorithm that utilizes randomized neural networks to drastically reduce computational costs while maintaining strong performance.
arXiv Detail & Related papers (2025-05-25T09:17:22Z)
Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate. Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z)
Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints [0.6466206145151128]
This paper studies an online optimal resource reservation problem in communication networks with job transfers. We propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints.
arXiv Detail & Related papers (2024-05-03T10:12:40Z)
Efficient Online Learning with Memory via Frank-Wolfe Optimization: Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z)
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite. We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z)
Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems. We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z)
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. We devise algorithms learning optimal tilt control policies from existing data. We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z)
Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z)
Unsupervised Deep Learning for Optimizing Wireless Systems with Instantaneous and Statistic Constraints [29.823814915538463]
We establish a unified framework of using unsupervised deep learning to solve both kinds of problems with both instantaneous and statistic constraints. We show that unsupervised learning outperforms supervised learning in terms of violation probability and approximation accuracy of the optimal policy.
arXiv Detail & Related papers (2020-05-30T13:37:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.