Related papers: Clustered Linear Contextual Bandits with Knapsacks

Clustered Linear Contextual Bandits with Knapsacks

URL: http://arxiv.org/abs/2308.10722v1
Date: Mon, 21 Aug 2023 13:47:13 GMT
Title: Clustered Linear Contextual Bandits with Knapsacks
Authors: Yichuan Deng, Michalis Mamakos, Zhao Song
Abstract summary: We study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources. We show that it suffices to perform clustering only once to a randomly selected subset of the arms.
Score: 9.668078830796999
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In this work, we study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. The arms are divided in clusters, with the cluster memberships being unknown to an algorithm. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources, and with the total consumption of any resource exceeding a constraint implying the termination of the algorithm. Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships. We provide an algorithm that achieves regret sublinear in the number of time periods, without requiring access to all of the arms. In particular, we show that it suffices to perform clustering only once to a randomly selected subset of the arms. To achieve this result, we provide a sophisticated combination of techniques from the literature of econometrics and of bandits with constraints.

Related papers

Large Scale Constrained Clustering With Reinforcement Learning [1.3597551064547502]
Given a network, allocating resources at clusters level, rather than at each node, enhances efficiency in resource allocation and usage. We propose an approach to solve this constrained clustering problem via reinforcement learning. In the results section, we show that our algorithm finds near optimal solutions, even for large scale instances.
arXiv Detail & Related papers (2024-02-15T18:27:18Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Optimal Algorithms for Latent Bandits with Cluster Structure [50.44722775727619]
We consider the problem of latent bandits with cluster structure where there are multiple users, each with an associated multi-armed bandit problem. We propose LATTICE which allows exploitation of the latent cluster structure to provide the minimax optimal regret of $widetildeO(sqrt(mathsfM+mathsfN)mathsfT.
arXiv Detail & Related papers (2023-01-17T17:49:04Z)
Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm [3.7491936479803054]
We propose a new hierarchical clustering linkage criterion called Genie. Our algorithm links two clusters in such a way that a chosen economic inequity measure does not drastically increase above a given threshold. A reference implementation of the algorithm has been included in the open source 'genie' package for R.
arXiv Detail & Related papers (2022-09-13T06:42:53Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering [0.5801044612920815]
We present a new branch-and-bound algorithm for semi-supervised MSSC. Background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points.
arXiv Detail & Related papers (2021-11-30T17:08:53Z)
Censored Semi-Bandits for Resource Allocation [12.450488894967773]
We consider the problem of sequentially allocating resources in a censored semi-bandits setup. The loss depends on two hidden parameters, one specific to the arm but independent of the resource allocation, and the other depends on the allocated resource. We derive optimal algorithms for our problem setting using known algorithms for MP-MAB and Combinatorial Semi-Bandits.
arXiv Detail & Related papers (2021-04-12T19:15:32Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.