On Pareto Optimality for the Multinomial Logistic Bandit
- URL: http://arxiv.org/abs/2501.19277v1
- Date: Fri, 31 Jan 2025 16:42:29 GMT
- Title: On Pareto Optimality for the Multinomial Logistic Bandit
- Authors: Jierui Zuo, Hanzhang Qin,
- Abstract summary: We provide a new online learning algorithm for tackling the Multinomial Logit Bandit problem.
Despite the challenges posed by the MNL model, we develop a novel Upper Confidence Bound (UCB)-based method.
We develop theoretical guarantees characterizing the tradeoff between regret and estimation error for the MNL-Bandit problem.
- Score: 0.0
- License:
- Abstract: We provide a new online learning algorithm for tackling the Multinomial Logit Bandit (MNL-Bandit) problem. Despite the challenges posed by the combinatorial nature of the MNL model, we develop a novel Upper Confidence Bound (UCB)-based method that achieves Pareto optimality by balancing regret minimization and estimation error of the assortment revenues and the MNL parameters. We develop theoretical guarantees characterizing the tradeoff between regret and estimation error for the MNL-Bandit problem through information-theoretic bounds, and propose a modified UCB algorithm that incorporates forced exploration to improve parameter estimation accuracy while maintaining low regret. Our analysis sheds critical insights into how to optimally balance the collected revenues and the treatment estimation in dynamic assortment optimization.
Related papers
- Solving Finite-Horizon MDPs via Low-Rank Tensors [9.072279909866845]
We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs)
In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary.
We propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable.
arXiv Detail & Related papers (2025-01-17T23:10:50Z) - Logarithmic Neyman Regret for Adaptive Estimation of the Average Treatment Effect [36.25361703897723]
Estimation of the Average Treatment Effect (ATE) is a core problem in causal inference with strong connections to Off-Policy Evaluation in Reinforcement Learning.
This paper considers the problem of adaptively selecting the treatment allocation probability in order to improve estimation of the ATE.
arXiv Detail & Related papers (2024-11-21T17:38:49Z) - Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks [4.202961704179733]
In many real-world settings, some of these parameters are unknown or uncertain.
Recent research focuses on predicting the value of unknown parameters using available contextual features.
We propose a novel framework that models uncertainty Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver.
arXiv Detail & Related papers (2024-06-05T09:11:46Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Stochastic Methods for AUC Optimization subject to AUC-based Fairness
Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints.
We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z) - Bayesian Optimisation for Constrained Problems [0.0]
We propose a novel variant of the well-known Knowledge Gradient acquisition function that allows it to handle constraints.
We empirically compare the new algorithm with four other state-of-the-art constrained Bayesian optimisation algorithms and demonstrate its superior performance.
arXiv Detail & Related papers (2021-05-27T15:43:09Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.