Offline Clustering of Linear Bandits: Unlocking the Power of Clusters in Data-Limited Environments
- URL: http://arxiv.org/abs/2505.19043v1
- Date: Sun, 25 May 2025 08:43:40 GMT
- Title: Offline Clustering of Linear Bandits: Unlocking the Power of Clusters in Data-Limited Environments
- Authors: Jingyuan Liu, Zeyu Zhang, Xuchuang Wang, Xutong Liu, John C. S. Lui, Mohammad Hajiesmaili, Carlee Joe-Wong,
- Abstract summary: We study how to use the offline dataset to learn cluster properties and improve decision-making across multiple users.<n>The key challenge in Off-ClusBand arises from data insufficiency for users.<n>We propose two algorithms: Off-C$2$LUB, which performs well for arbitrary amounts of user data, and Off-CLUB, which is prone to bias when data is limited.
- Score: 39.92799383936439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual linear multi-armed bandits are a learning framework for making a sequence of decisions, e.g., advertising recommendations for a sequence of arriving users. Recent works have shown that clustering these users based on the similarity of their learned preferences can significantly accelerate the learning. However, prior work has primarily focused on the online setting, which requires continually collecting user data, ignoring the offline data widely available in many applications. To tackle these limitations, we study the offline clustering of bandits (Off-ClusBand) problem, which studies how to use the offline dataset to learn cluster properties and improve decision-making across multiple users. The key challenge in Off-ClusBand arises from data insufficiency for users: unlike the online case, in the offline case, we have a fixed, limited dataset to work from and thus must determine whether we have enough data to confidently cluster users together. To address this challenge, we propose two algorithms: Off-C$^2$LUB, which we analytically show performs well for arbitrary amounts of user data, and Off-CLUB, which is prone to bias when data is limited but, given sufficient data, matches a theoretical lower bound that we derive for the offline clustered MAB problem. We experimentally validate these results on both real and synthetic datasets.
Related papers
- Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity [40.998188469865184]
"Sparse-Reg" is a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning.<n>We show that offline RL algorithms can overfit on small datasets, resulting in poor performance.
arXiv Detail & Related papers (2025-06-20T16:57:59Z) - Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.<n>Off-CMAB combines pessimistic reward estimations with solvers.<n>Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z) - Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning [116.87367592920171]
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset.
In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks.
We propose an uncertainty-based Multi-Task Data Sharing (MTDS) approach that shares the entire dataset without data selection.
arXiv Detail & Related papers (2024-04-30T08:16:52Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.