Rebounding Bandits for Modeling Satiation Effects
- URL: http://arxiv.org/abs/2011.06741v3
- Date: Wed, 27 Oct 2021 13:39:58 GMT
- Title: Rebounding Bandits for Modeling Satiation Effects
- Authors: Liu Leqi, Fatma Kilinc-Karzan, Zachary C. Lipton, Alan L. Montgomery
- Abstract summary: We introduce rebounding bandits, a multi-armed bandit setup, where satiation dynamics are modeled as time-invariant linear dynamical systems.
We characterize the planning problem showing that the greedy policy is optimal when arms exhibit identical dynamics.
- Score: 22.92512152419544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Psychological research shows that enjoyment of many goods is subject to
satiation, with short-term satisfaction declining after repeated exposures to
the same item. Nevertheless, proposed algorithms for powering recommender
systems seldom model these dynamics, instead proceeding as though user
preferences were fixed in time. In this work, we introduce rebounding bandits,
a multi-armed bandit setup, where satiation dynamics are modeled as
time-invariant linear dynamical systems. Expected rewards for each arm decline
monotonically with consecutive exposures to it and rebound towards the initial
reward whenever that arm is not pulled. Unlike classical bandit settings,
methods for tackling rebounding bandits must plan ahead and model-based methods
rely on estimating the parameters of the satiation dynamics. We characterize
the planning problem, showing that the greedy policy is optimal when the arms
exhibit identical deterministic dynamics. To address stochastic satiation
dynamics with unknown parameters, we propose Explore-Estimate-Plan (EEP), an
algorithm that pulls arms methodically, estimates the system dynamics, and then
plans accordingly.
Related papers
- Dynamic Obstacle Avoidance through Uncertainty-Based Adaptive Planning with Diffusion [40.76697924496143]
We propose an adaptive generative planning approach that adjusts replanning frequency based on the uncertainty of action predictions.
Our method minimizes the need for frequent, computationally expensive, and redundant replanning while maintaining robust collision avoidance performance.
arXiv Detail & Related papers (2024-09-25T14:03:58Z) - Tractable Joint Prediction and Planning over Discrete Behavior Modes for
Urban Driving [15.671811785579118]
We show that we can parameterize autoregressive closed-loop models without retraining.
We propose fully reactive closed-loop planning over discrete latent modes.
Our approach also outperforms the previous state-of-the-art in CARLA on challenging dense traffic scenarios.
arXiv Detail & Related papers (2024-03-12T01:00:52Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Non-Stationary Bandits with Auto-Regressive Temporal Dependency [14.093856726745662]
This paper introduces a novel non-stationary MAB framework that captures the temporal structure of real-world dynamics through an auto-regressive (AR) reward structure.
We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information.
arXiv Detail & Related papers (2022-10-28T20:02:21Z) - Maximum entropy exploration in contextual bandits with neural networks
and energy based models [63.872634680339644]
We present two classes of models, one with neural networks as reward estimators, and the other with energy based models.
We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance.
This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
arXiv Detail & Related papers (2022-10-12T15:09:45Z) - Understanding the stochastic dynamics of sequential decision-making
processes: A path-integral analysis of multi-armed bandits [7.05949591248206]
The multi-armed bandit (MAB) model is one of the most popular models to study decision-making in an uncertain environment.
In this paper, we employ techniques in statistical physics to analyze the MAB model.
arXiv Detail & Related papers (2022-08-11T09:32:03Z) - Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by
a Linear Dynamical System [2.0460959603642004]
We propose a variant of the multi-armed bandit where the rewards are sampled from a linear dynamical system.
The proposed strategy for this multi-armed variant is to learn a model of the dynamical system while choosing the optimal action based on the learned model.
This strategy is applied to quantitative finance as a high-frequency trading strategy, where the goal is to maximize returns within a time period.
arXiv Detail & Related papers (2022-04-06T19:22:33Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - A Regret Minimization Approach to Iterative Learning Control [61.37088759497583]
We propose a new performance metric, planning regret, which replaces the standard uncertainty assumptions with worst case regret.
We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
arXiv Detail & Related papers (2021-02-26T13:48:49Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.