Provably Efficient Learning in Partially Observable Contextual Bandit
- URL: http://arxiv.org/abs/2308.03572v2
- Date: Mon, 4 Sep 2023 11:41:54 GMT
- Title: Provably Efficient Learning in Partially Observable Contextual Bandit
- Authors: Xueping Gong and Jiheng Zhang
- Abstract summary: We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
- Score: 4.910658441596583
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In this paper, we investigate transfer learning in partially observable
contextual bandits, where agents have limited knowledge from other agents and
partial information about hidden confounders. We first convert the problem to
identifying or partially identifying causal effects between actions and rewards
through optimization problems. To solve these optimization problems, we
discretize the original functional constraints of unknown distributions into
linear constraints, and sample compatible causal models via sequentially
solving linear programmings to obtain causal bounds with the consideration of
estimation error. Our sampling algorithms provide desirable convergence results
for suitable sampling distributions. We then show how causal bounds can be
applied to improving classical bandit algorithms and affect the regrets with
respect to the size of action sets and function spaces. Notably, in the task
with function approximation which allows us to handle general context
distributions, our method improves the order dependence on function space size
compared with previous literatures. We formally prove that our causally
enhanced algorithms outperform classical bandit algorithms and achieve orders
of magnitude faster convergence rates. Finally, we perform simulations that
demonstrate the efficiency of our strategy compared to the current
state-of-the-art methods. This research has the potential to enhance the
performance of contextual bandit agents in real-world applications where data
is scarce and costly to obtain.
Related papers
- Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.
Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.
We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - Efficient Differentiable Discovery of Causal Order [14.980926991441342]
Intersort is a score-based method to discover causal order of variables.
We reformulate Intersort using differentiable sorting and ranking techniques.
Our work opens the door to efficiently incorporating regularization for causal order into the training of differentiable models.
arXiv Detail & Related papers (2024-10-11T13:11:55Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising.
These problems are exacerbated by the cold start problem and data sparsity problem.
Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages.
Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - dugMatting: Decomposed-Uncertainty-Guided Matting [83.71273621169404]
We propose a decomposed-uncertainty-guided matting algorithm, which explores the explicitly decomposed uncertainties to efficiently and effectively improve the results.
The proposed matting framework relieves the requirement for users to determine the interaction areas by using simple and efficient labeling.
arXiv Detail & Related papers (2023-06-02T11:19:50Z) - Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - On data-driven chance constraint learning for mixed-integer optimization
problems [0.0]
We develop a Chance Constraint Learning (CCL) methodology with a focus on mixed-integer linear optimization problems.
CCL makes use of linearizable machine learning models to estimate conditional quantiles of the learned variables.
An open-access software has been developed to be used by practitioners.
arXiv Detail & Related papers (2022-07-08T11:54:39Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Fusion and Orthogonal Projection for Improved Face-Voice Association [15.938463726577128]
We study the problem of learning association between face and voice.
We propose a light-weight, plug-and-play mechanism that exploits the complementary cues in both modalities to form enriched fused embeddings.
arXiv Detail & Related papers (2021-12-20T12:33:33Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z) - Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm.
Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.