Related papers: Latent Preference Bandits

Latent Preference Bandits

URL: http://arxiv.org/abs/2508.05367v1
Date: Thu, 07 Aug 2025 13:13:28 GMT
Title: Latent Preference Bandits
Authors: Newton Mwai, Emil Carlsson, Fredrik D. Johansson,
Abstract summary: Bandit algorithms are guaranteed to solve diverse sequential decision-making problems.<n>Latent bandits offer substantially reduced exploration times for such problems, given that the joint distribution of a latent state and the rewards of actions is known and accurate.<n>In practice, finding such a model is non-trivial, and there may not exist a small number of latent states that explain the responses of all individuals.
Score: 7.731569068280131
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bandit algorithms are guaranteed to solve diverse sequential decision-making problems, provided that a sufficient exploration budget is available. However, learning from scratch is often too costly for personalization tasks where a single individual faces only a small number of decision points. Latent bandits offer substantially reduced exploration times for such problems, given that the joint distribution of a latent state and the rewards of actions is known and accurate. In practice, finding such a model is non-trivial, and there may not exist a small number of latent states that explain the responses of all individuals. For example, patients with similar latent conditions may have the same preference in treatments but rate their symptoms on different scales. With this in mind, we propose relaxing the assumptions of latent bandits to require only a model of the \emph{preference ordering} of actions in each latent state. This allows problem instances with the same latent state to vary in their reward distributions, as long as their preference orderings are equal. We give a posterior-sampling algorithm for this problem and demonstrate that its empirical performance is competitive with latent bandits that have full knowledge of the reward distribution when this is well-specified, and outperforms them when reward scales differ between instances with the same latent state.

Related papers

Counterfactual Realizability [52.85109506684737]
We introduce a formal definition of realizability, the ability to draw samples from a distribution, and then develop a complete algorithm to determine whether an arbitrary counterfactual distribution is realizable.<n>We illustrate the implications of this new framework for counterfactual data collection using motivating examples from causal fairness and causal reinforcement learning.
arXiv Detail & Related papers (2025-03-14T20:54:27Z)
The Minimal Search Space for Conditional Causal Bandits [0.18124328823188351]
Causal knowledge can be used to support decision-making problems.<n>This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention.<n>We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes.
arXiv Detail & Related papers (2025-02-10T15:45:18Z)
Identifying General Mechanism Shifts in Linear Causal Representations [58.6238439611389]
We consider the linear causal representation learning setting where we observe a linear mixing of $d$ unknown latent factors. Recent work has shown that it is possible to recover the latent factors as well as the underlying structural causal model over them. We provide a surprising identifiability result that it is indeed possible, under some very mild standard assumptions, to identify the set of shifted nodes.
arXiv Detail & Related papers (2024-10-31T15:56:50Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Holistic Approach to Measure Sample-level Adversarial Vulnerability and its Utility in Building Trustworthy Systems [17.707594255626216]
Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction. We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives. We demonstrate that by reliably estimating adversarial vulnerability at the sample level, it is possible to develop a trustworthy system.
arXiv Detail & Related papers (2022-05-05T12:36:17Z)
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations. We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z)
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling [1.6114012813668934]
Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference. Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit$-$trading off regret$-$and require large sample sizes to ensure guarantees. We introduce a novel hypothesis test, uniquely based on the allocation probabilities of the bandit algorithm, and without constraining its exploitative nature or requiring a minimum experimental size. We demonstrate the regret and inferential advantages of our approach, particularly in small samples, in both extensive simulations and in a real-world experiment on mental health aspects
arXiv Detail & Related papers (2021-10-30T01:47:14Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.