Related papers: Offline Clustering of Linear Bandits: The Power of Clusters under Limited Data

Offline Clustering of Linear Bandits: The Power of Clusters under Limited Data

URL: http://arxiv.org/abs/2505.19043v2
Date: Sat, 25 Oct 2025 08:29:46 GMT
Title: Offline Clustering of Linear Bandits: The Power of Clusters under Limited Data
Authors: Jingyuan Liu, Zeyu Zhang, Xuchuang Wang, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili, Carlee Joe-Wong,
Abstract summary: We study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making.<n>We propose two algorithms: Off-C2LUB, which we show analytically and experimentally outperforms existing methods under limited offline user data, and Off-CLUB, which may incur bias when data is sparse but performs well and nearly matches the lower bound when data is sufficient.
Score: 60.91600085523719
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual multi-armed bandit is a fundamental learning framework for making a sequence of decisions, e.g., advertising recommendations for a sequence of arriving users. Recent works have shown that clustering these users based on the similarity of their learned preferences can accelerate the learning. However, prior work has primarily focused on the online setting, which requires continually collecting user data, ignoring the offline data widely available in many applications. To tackle these limitations, we study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making. The key challenge in Off-ClusBand arises from data insufficiency for users: unlike the online case where we continually learn from online data, in the offline case, we have a fixed, limited dataset to work from and thus must determine whether we have enough data to confidently cluster users together. To address this challenge, we propose two algorithms: Off-C2LUB, which we show analytically and experimentally outperforms existing methods under limited offline user data, and Off-CLUB, which may incur bias when data is sparse but performs well and nearly matches the lower bound when data is sufficient. We experimentally validate these results on both real and synthetic datasets.

Related papers

Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms [10.146314852311638]
We propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner.<n>Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence.<n>We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches.
arXiv Detail & Related papers (2025-12-26T08:42:12Z)
Offline Behavioral Data Selection [58.116300485427764]
We show that policy performance rapidly saturates when trained on a small fraction of the dataset.<n>We propose a simple yet effective method, Stepwise Dual Ranking (SDR), which extracts a compact yet informative subset from large-scale offline behavioral datasets.<n>Extensive experiments and ablation studies on D4RL benchmarks demonstrate that SDR significantly enhances data selection for offline behavioral data.
arXiv Detail & Related papers (2025-12-20T07:10:58Z)
A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs [55.931369468485464]
We tackle offline data selection and online self-refining generations through an optimization perspective.<n>For the first time, we theoretically demonstrate the effectiveness of the bilevel data selection framework.
arXiv Detail & Related papers (2025-11-26T04:48:33Z)
Offline Clustering of Preference Learning with Active-data Augmentation [32.93090135413931]
Real-world preference learning often involves users with different preferences.<n>This setting presents two central challenges: identifying similarity across users to effectively aggregate data, and handling imbalanced offline data.<n>We propose Off-C$2$PL for the pure offline setting, where the learner relies solely on offline data.<n>We extend our framework to the setting with active-data augmentation where the learner is allowed to select a limited number of additional active-data for the test user.
arXiv Detail & Related papers (2025-10-30T09:39:05Z)
Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity [40.998188469865184]
"Sparse-Reg" is a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning.<n>We show that offline RL algorithms can overfit on small datasets, resulting in poor performance.
arXiv Detail & Related papers (2025-06-20T16:57:59Z)
Best Arm Identification with Possibly Biased Offline Data [56.965938201853625]
We study the best arm identification problem with potentially biased offline data in the fixed confidence setting.<n>We propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction.
arXiv Detail & Related papers (2025-05-29T06:58:49Z)
Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.<n>Off-CMAB combines pessimistic reward estimations with solvers.<n>Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z)
Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks. We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning [27.80266207283246]
We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online. We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
arXiv Detail & Related papers (2023-03-14T08:13:21Z)
CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data. We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets. We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z)
Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online. We extensively ablate these design choices, demonstrating the key factors that most affect performance. We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction. behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning. We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.