Related papers: Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

URL: http://arxiv.org/abs/2202.00867v1
Date: Wed, 2 Feb 2022 04:03:19 GMT
Title: Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts
Authors: Hongju Park and Mohamad Kazem Shirani Faradonbeh
Abstract summary: We present an implementable posterior sampling algorithm for bandits with imperfect context observations. The proposed algorithm exposes efficiency in learning from the noisy imperfect observations and taking actions accordingly.
Score: 1.370633147306388
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Contextual bandits are widely-used in the study of learning-based control policies for finite action spaces. While the problem is well-studied for bandits with perfectly observed context vectors, little is known about the case of imperfectly observed contexts. For this setting, existing approaches are inapplicable and new conceptual and technical frameworks are required. We present an implementable posterior sampling algorithm for bandits with imperfect context observations and study its performance for learning optimal decisions. The provided numerical results relate the performance of the algorithm to different quantities of interest including the number of arms, dimensions, observation matrices, posterior rescaling factors, and signal-to-noise ratios. In general, the proposed algorithm exposes efficiency in learning from the noisy imperfect observations and taking actions accordingly. Enlightening understandings the analyses provide as well as interesting future directions it points to, are discussed as well.

Related papers

Optimal cross-learning for contextual bandits with unknown context distributions [28.087360479901978]
We consider the problem of designing contextual bandit algorithms in the cross-learning'' setting of Balseiro et al. We provide an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $widetildeO(sqrtTK)$, independent of the number of contexts. At the core of our algorithm is a novel technique for coordinating the execution of an algorithm over multiple epochs.
arXiv Detail & Related papers (2024-01-03T18:02:13Z)
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning [74.67655210734338]
In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption. We develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. We empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks.
arXiv Detail & Related papers (2023-11-20T23:56:58Z)
Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms. This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes [6.729108277517129]
This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process. The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances.
arXiv Detail & Related papers (2023-07-29T08:52:35Z)
Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts. We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z)
Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations. The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Contextual Model Aggregation for Fast and Robust Federated Learning in Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge. Existing algorithms face issues with slow convergence and/or robustness of performance. We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z)
Performance Analysis of Fractional Learning Algorithms [32.21539962359158]
It is unclear whether the proclaimed superiority over conventional algorithms is well-grounded or is a myth as their performance has never been extensively analyzed. In this article, a rigorous analysis of fractional variants of the least mean squares and steepest descent algorithms is performed. Their origins and consequences on the performance of the learning algorithms are discussed and swift ready-witted remedies are proposed.
arXiv Detail & Related papers (2021-10-11T12:06:44Z)
Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art [3.6954802719347413]
Safe learning and optimization deals with learning and optimization problems that avoid, as much as possible, the evaluation of non-safe input points. A comprehensive survey of safe reinforcement learning algorithms was published in 2015, but related works in active learning and in optimization were not considered. This paper reviews those algorithms from a number of domains including reinforcement learning, Gaussian process regression and classification, evolutionary algorithms, and active learning.
arXiv Detail & Related papers (2021-01-23T13:58:09Z)
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z)
Efficient Contextual Bandits with Continuous Actions [102.64518426624535]
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations.
arXiv Detail & Related papers (2020-06-10T19:38:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.