Related papers: Adversarial Online Learning with Temporal Feedback Graphs

Adversarial Online Learning with Temporal Feedback Graphs

URL: http://arxiv.org/abs/2407.00571v1
Date: Sun, 30 Jun 2024 03:14:17 GMT
Title: Adversarial Online Learning with Temporal Feedback Graphs
Authors: Khashayar Gatmiry, Jon Schneider,
Abstract summary: We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds. We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph.
Score: 13.267203883254087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).

Related papers

LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
Efficient Contextual Bandits with Uninformed Feedback Graphs [48.77120088347271]
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems. We show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees.
arXiv Detail & Related papers (2024-02-12T23:50:47Z)
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z)
Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs. We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting. This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling [12.933990572597583]
We focus on the fundamental and practical aggregation method known as logarithmic pooling. We consider the problem of learning the best set of parameters in an online adversarial setting. We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
arXiv Detail & Related papers (2022-02-22T22:27:25Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z)
Online non-convex optimization with imperfect feedback [33.80530308979131]
We consider the problem of online learning with non- losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function at each stage. We propose a mixed-strategy learning policy based on dual averaging.
arXiv Detail & Related papers (2020-10-16T16:53:13Z)
Reinforcement Learning with Feedback Graphs [69.1524391595912]
We study episodic reinforcement learning in decision processes when the agent receives additional feedback per step. We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning.
arXiv Detail & Related papers (2020-05-07T22:35:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.