Practical Contextual Bandits with Feedback Graphs
- URL: http://arxiv.org/abs/2302.08631v3
- Date: Fri, 27 Oct 2023 00:04:33 GMT
- Title: Practical Contextual Bandits with Feedback Graphs
- Authors: Mengxiao Zhang, Yuheng Zhang, Olga Vrousgou, Haipeng Luo, Paul Mineiro
- Abstract summary: We propose and analyze an approach to contextual bandits with feedback graphs based upon reduction to regression.
The resulting algorithms are computationally practical and achieve established minimax rates.
- Score: 44.76976254893256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While contextual bandit has a mature theory, effectively leveraging different
feedback patterns to enhance the pace of learning remains unclear. Bandits with
feedback graphs, which interpolates between the full information and bandit
regimes, provides a promising framework to mitigate the statistical complexity
of learning. In this paper, we propose and analyze an approach to contextual
bandits with feedback graphs based upon reduction to regression. The resulting
algorithms are computationally practical and achieve established minimax rates,
thereby reducing the statistical complexity in real-world applications.
Related papers
- Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate.
This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z) - Mode Estimation with Partial Feedback [20.426429576184145]
We formalize core aspects of weakly supervised and active learning with a simple problem.
We show how entropy coding allows for optimal information acquisition from partial feedback.
arXiv Detail & Related papers (2024-02-20T15:24:21Z) - Efficient Contextual Bandits with Uninformed Feedback Graphs [48.77120088347271]
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems.
We show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees.
arXiv Detail & Related papers (2024-02-12T23:50:47Z) - Improved Algorithms for Bandit with Graph Feedback via Regret
Decomposition [2.3034251503343466]
The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem.
We propose a new algorithmic framework for the problem based on a partition of the feedback graph.
arXiv Detail & Related papers (2022-05-30T13:07:42Z) - Robust Contrastive Learning against Noisy Views [79.71880076439297]
We propose a new contrastive loss function that is robust against noisy views.
We show that our approach provides consistent improvements over the state-of-the-art image, video, and graph contrastive learning benchmarks.
arXiv Detail & Related papers (2022-01-12T05:24:29Z) - Addressing Class Imbalance in Scene Graph Parsing by Learning to
Contrast and Score [65.18522219013786]
Scene graph parsing aims to detect objects in an image scene and recognize their relations.
Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations.
This paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem.
arXiv Detail & Related papers (2020-09-28T13:57:59Z) - Reinforcement Learning with Feedback Graphs [69.1524391595912]
We study episodic reinforcement learning in decision processes when the agent receives additional feedback per step.
We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning.
arXiv Detail & Related papers (2020-05-07T22:35:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.