Stochastic Graph Bandit Learning with Side-Observations
- URL: http://arxiv.org/abs/2308.15107v2
- Date: Sat, 6 Jan 2024 16:17:24 GMT
- Title: Stochastic Graph Bandit Learning with Side-Observations
- Authors: Xueping Gong and Jiheng Zhang
- Abstract summary: We propose an algorithm that adapts to both the underlying graph structures and reward gaps.
To the best of our knowledge, our algorithm is the first to provide a gap-dependent upper bound in this setting.
- Score: 4.910658441596583
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In this paper, we investigate the stochastic contextual bandit with general
function space and graph feedback. We propose an algorithm that addresses this
problem by adapting to both the underlying graph structures and reward gaps. To
the best of our knowledge, our algorithm is the first to provide a
gap-dependent upper bound in this stochastic setting, bridging the research gap
left by the work in [35]. In comparison to [31,33,35], our method offers
improved regret upper bounds and does not require knowledge of graphical
quantities. We conduct numerical experiments to demonstrate the computational
efficiency and effectiveness of our approach in terms of regret upper bounds.
These findings highlight the significance of our algorithm in advancing the
field of stochastic contextual bandits with graph feedback, opening up avenues
for practical applications in various domains.
Related papers
- Improved Graph-based semi-supervised learning Schemes [0.0]
In this work, we improve the accuracy of several known algorithms to address the classification of large datasets when few labels are available.
Our framework lies in the realm of graph-based semi-supervised learning.
arXiv Detail & Related papers (2024-06-30T16:50:08Z) - The graph alignment problem: fundamental limits and efficient algorithms [0.9246334723892301]
The noisy version of the graph isomorphism problem aims to find a matching between the nodes of two graphs which preserves most of the edges.
This thesis focuses on understanding the fundamental information-theoretical limits for this problem, as well as designing and analyzing algorithms that are able to recover the underlying alignment in the data.
arXiv Detail & Related papers (2024-04-18T15:31:13Z) - Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - Efficient Contextual Bandits with Uninformed Feedback Graphs [48.77120088347271]
Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems.
We show that it is critical to learn the graphs using log loss instead of squared loss to obtain favorable regret guarantees.
arXiv Detail & Related papers (2024-02-12T23:50:47Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems.
We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z) - Joint graph learning from Gaussian observations in the presence of
hidden nodes [26.133725549667734]
We propose a joint graph learning method that takes into account the presence of hidden (latent) variables.
We exploit the structure resulting from the previous considerations to propose a convex optimization problem.
We compare the proposed algorithm with different baselines and evaluate its performance over synthetic and real-world graphs.
arXiv Detail & Related papers (2022-12-04T13:03:41Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - Asymptotic study of stochastic adaptive algorithm in non-convex
landscape [2.1320960069210484]
This paper studies some assumption properties of adaptive algorithms widely used in optimization and machine learning.
Among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms.
arXiv Detail & Related papers (2020-12-10T12:54:45Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.