Related papers: Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

URL: http://arxiv.org/abs/2407.16239v4
Date: Wed, 11 Jun 2025 09:30:45 GMT
Title: Identifiable Latent Bandits: Leveraging observational data for personalized decision-making
Authors: Ahmet Zahid Balcıoğlu, Newton Mwai, Emil Carlsson, Fredrik D. Johansson,
Abstract summary: We propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits.<n>Our method is based on nonlinear independent component analysis that provably identifies representations from observational data sufficient to infer the optimal action in new bandit instances.
Score: 7.0774164818430565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For many decision-making tasks, such as precision medicine, historical data alone are insufficient to determine the right choice for a new problem instance or patient. Online algorithms like multi-armed bandits can find optimal personalized decisions but are notoriously sample-hungry. In practice, training a bandit for a new individual from scratch is often infeasible, as the number of trials required is larger than the practical number of decision points. Latent bandits offer rapid exploration and personalization beyond what context variables can reveal, provided that a latent variable model can be learned consistently. In this work, we propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits by learning from historical records of decisions and outcomes. Our method is based on nonlinear independent component analysis that provably identifies representations from observational data sufficient to infer the optimal action in new bandit instances. We verify this strategy in simulated and semi-synthetic environments, showing substantial improvement over online and offline learning baselines when identifying conditions are satisfied.

Related papers

Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding [2.9457161327910693]
We introduce a multiple criteria Bayesian preference learning framework incorporating behavioral cues for decision aiding.<n>The framework integrates pairwise comparisons, response time, and attention duration to deepen insights into decision-making processes.
arXiv Detail & Related papers (2025-04-21T08:01:44Z)
Stochastic Linear Bandits with Latent Heterogeneity [8.981251210938787]
We propose a novel latent heterogeneous bandit framework that explicitly models this unobserved heterogeneity in customer responses.<n>Our methodology introduces an innovative algorithm that simultaneously learns latent group memberships and group-specific reward functions.
arXiv Detail & Related papers (2025-02-01T13:02:21Z)
Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. We show that selection structure is identifiable without any parametric assumptions or interventional experiments. We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z)
Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner. We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z)
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement [25.68354404229254]
We show that even in a data-starved setting it may still be possible to find a policy competitive with the optimal one. This paves the way to reliable decision-making in settings where critical decisions must be made by relying only on a handful of samples.
arXiv Detail & Related papers (2024-02-24T03:41:09Z)
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study [1.5936659933030128]
Mobile health interventions aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions. Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts. The current work addresses this challenge by leveraging count data models into online decision-making approaches.
arXiv Detail & Related papers (2023-11-24T09:02:24Z)
Robust Best-arm Identification in Linear Bandits [25.91361349646875]
We study the robust best-arm identification problem (RBAI) in the case of linear rewards. We present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards. Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
arXiv Detail & Related papers (2023-11-08T14:58:11Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Towards Accelerated Model Training via Bayesian Data Selection [45.62338106716745]
We propose a more reasonable data selection principle by examining the data's impact on the model's generalization loss. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-21T07:58:15Z)
Leveraging Unlabelled Data in Multiple-Instance Learning Problems for Improved Detection of Parkinsonian Tremor in Free-Living Conditions [80.88681952022479]
We introduce a new method for combining semi-supervised with multiple-instance learning. We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains in per-subject tremor detection.
arXiv Detail & Related papers (2023-04-29T12:25:10Z)
ORF-Net: Deep Omni-supervised Rib Fracture Detection from Chest CT Scans [47.7670302148812]
radiologists need to investigate and annotate rib fractures on a slice-by-slice basis. We propose a novel omni-supervised object detection network, which can exploit multiple different forms of annotated data. Our proposed method outperforms other state-of-the-art approaches consistently.
arXiv Detail & Related papers (2022-07-05T07:06:57Z)
MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types. MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random. We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z)
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations. We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z)
Reconciling Risk Allocation and Prevalence Estimation in Public Health Using Batched Bandits [0.0]
In many public health settings, there is a perceived tension between allocating resources to known vulnerable areas and learning about the overall prevalence of the problem. Inspired by a door-to-door Covid-19 testing program we helped design, we combine multi-armed bandit strategies and insights from sampling theory to demonstrate how to recover accurate prevalence estimates while continuing to allocate resources to at-risk areas.
arXiv Detail & Related papers (2021-10-25T22:33:46Z)
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits. We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round. Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator. We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors. Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.