Related papers: PASTA: Pessimistic Assortment Optimization

PASTA: Pessimistic Assortment Optimization

URL: http://arxiv.org/abs/2302.03821v1
Date: Wed, 8 Feb 2023 01:11:51 GMT
Title: PASTA: Pessimistic Assortment Optimization
Authors: Juncheng Dong, Weibin Mo, Zhengling Qi, Cong Shi, Ethan X. Fang, Vahid Tarokh
Abstract summary: We consider a class of assortment optimization problems in an offline data-driven setting. We propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA) based on the principle of pessimism.
Score: 25.51792135903357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a class of assortment optimization problems in an offline data-driven setting. A firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, we propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA for short) designed based on the principle of pessimism, that can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish a regret bound for the offline assortment optimization problem under the celebrated multinomial logit model. We also propose an efficient computational procedure to solve our pessimistic assortment optimization problem. Numerical studies demonstrate the superiority of the proposed method over the existing baseline method.

Related papers

Learning an Optimal Assortment Policy under Observational Data [21.077030287930306]
We study the fundamental problem of offline assortment optimization under the Multinomial Logit (MNL) model. In this paper, we consider the offline learning paradigm and investigate the minimal data requirements for efficient offline assortment optimization.
arXiv Detail & Related papers (2025-02-10T18:54:41Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Offline Model-Based Optimization via Policy-Guided Gradient Search [30.87992788876113]
We introduce a new learning-to-search- gradient perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided search approach explicitly learns the best policy for a given surrogate model created from the offline data.
arXiv Detail & Related papers (2024-05-08T18:27:37Z)
Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization [38.57171985309975]
We develop efficient algorithms for the problem of regret in assortment selection with emphPlackett Luce (PL) based user choices. Our methods are practical, provably optimal, and devoid of the aforementioned limitations of the existing methods.
arXiv Detail & Related papers (2024-02-29T07:17:04Z)
Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice. We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
Online Joint Assortment-Inventory Optimization under MNL Choices [14.530542487845732]
We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model. We propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory.
arXiv Detail & Related papers (2023-04-04T09:25:34Z)
Data-Driven Offline Decision-Making via Invariant Representation Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction. A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good. In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z)
Optimal Data Selection: An Online Distributed View [61.31708750038692]
We develop algorithms for the online and distributed version of the problem. We show that our selection methods outperform random selection by $5-20%$. In learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20%$.
arXiv Detail & Related papers (2022-01-25T18:56:16Z)
Is Pessimism Provably Efficient for Offline RL? [104.00628430454479]
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. We propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function.
arXiv Detail & Related papers (2020-12-30T09:06:57Z)
DeepCO: Offline Combinatorial Optimization Framework Utilizing Deep Learning [1.2183405753834562]
We propose DeepCO, an offline optimization framework utilizing deep learning. We also design an offline variation of Travelling Salesman Problem (TSP) to model warehouse operation sequence optimization problem. With only limited historical data, novel proposed distribution regularized optimization outperforms existing baseline method in offline experiment reducing route length by 5.7% averagely.
arXiv Detail & Related papers (2020-07-20T04:17:30Z)
Optimizing Revenue while showing Relevant Assortments at Scale [1.0200170217746136]
Real-time assortment optimization has become essential in e-commerce operations. We design fast and flexible algorithms that find the optimal assortment in difficult regimes. Empirical validations using a real world dataset show that our algorithms are competitive even when the number of items is $sim 105$ ($10times$ larger instances than previously studied)
arXiv Detail & Related papers (2020-03-06T20:16:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.