Incentivized Exploration via Filtered Posterior Sampling
- URL: http://arxiv.org/abs/2402.13338v1
- Date: Tue, 20 Feb 2024 19:30:55 GMT
- Title: Incentivized Exploration via Filtered Posterior Sampling
- Authors: Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur
- Abstract summary: We study "incentivized exploration" (IE) in social learning problems where the principal can leverage information asymmetry to incentivize agents to take exploratory actions.
We identify posterior sampling, an algorithmic approach that is well known in the multi-armed bandits literature, as a general-purpose solution for IE.
- Score: 51.32577788466152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study "incentivized exploration" (IE) in social learning problems where
the principal (a recommendation algorithm) can leverage information asymmetry
to incentivize sequentially-arriving agents to take exploratory actions. We
identify posterior sampling, an algorithmic approach that is well known in the
multi-armed bandits literature, as a general-purpose solution for IE. In
particular, we expand the existing scope of IE in several practically-relevant
dimensions, from private agent types to informative recommendations to
correlated Bayesian priors. We obtain a general analysis of posterior sampling
in IE which allows us to subsume these extended settings as corollaries, while
also recovering existing results as special cases.
Related papers
- A Comprehensive Survey of Datasets, Theories, Variants, and Applications in Direct Preference Optimization [52.42860559005861]
Direct Preference Optimization (DPO) has emerged as a promising approach for alignment.
Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature.
arXiv Detail & Related papers (2024-10-21T02:27:24Z) - Unified Domain Adaptive Semantic Segmentation [96.74199626935294]
Unsupervised Adaptive Domain Semantic (UDA-SS) aims to transfer the supervision from a labeled source domain to an unlabeled target domain.
We propose a Quad-directional Mixup (QuadMix) method, characterized by tackling distinct point attributes and feature inconsistencies.
Our method outperforms the state-of-the-art works by large margins on four challenging UDA-SS benchmarks.
arXiv Detail & Related papers (2023-11-22T09:18:49Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Generalized Video Anomaly Event Detection: Systematic Taxonomy and
Comparison of Deep Models [33.43062232461652]
Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems.
This survey extends the conventional scope of VAD beyond unsupervised methods, encompassing a broader spectrum termed Generalized Video Anomaly Event Detection (GVAED)
arXiv Detail & Related papers (2023-02-10T07:11:37Z) - Back-to-Bones: Rediscovering the Role of Backbones in Domain
Generalization [1.6799377888527687]
Domain Generalization studies the capability of a deep learning model to generalize to out-of-training distributions.
Recent research has provided a reproducible benchmark for DG, pointing out the effectiveness of naive empirical risk minimization (ERM) over existing algorithms.
In this paper, we evaluate the backbones proposing a comprehensive analysis of their intrinsic generalization capabilities.
arXiv Detail & Related papers (2022-09-02T15:30:17Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [4.845576821204241]
We formulate a display advertising recommender as a contextual bandit.
We implement exploration techniques that require sampling from the posterior distribution of click-through-rates.
We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
arXiv Detail & Related papers (2020-08-03T08:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.