Balanced Off-Policy Evaluation for Personalized Pricing
- URL: http://arxiv.org/abs/2302.12736v1
- Date: Fri, 24 Feb 2023 16:44:46 GMT
- Title: Balanced Off-Policy Evaluation for Personalized Pricing
- Authors: Adam N. Elmachtoub, Vishal Gupta and Yunfan Zhao
- Abstract summary: We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand.
The goal is to perform off-policy evaluation for a new personalized pricing policy that maps features to prices.
Building on the balanced policy evaluation framework of Kallus, we propose a new approach tailored to pricing applications.
- Score: 3.296526804364952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a personalized pricing problem in which we have data consisting
of feature information, historical pricing decisions, and binary realized
demand. The goal is to perform off-policy evaluation for a new personalized
pricing policy that maps features to prices. Methods based on inverse
propensity weighting (including doubly robust methods) for off-policy
evaluation may perform poorly when the logging policy has little exploration or
is deterministic, which is common in pricing applications. Building on the
balanced policy evaluation framework of Kallus (2018), we propose a new
approach tailored to pricing applications. The key idea is to compute an
estimate that minimizes the worst-case mean squared error or maximizes a
worst-case lower bound on policy performance, where in both cases the
worst-case is taken with respect to a set of possible revenue functions. We
establish theoretical convergence guarantees and empirically demonstrate the
advantage of our approach using a real-world pricing dataset.
Related papers
- Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Personalized Pricing with Invalid Instrumental Variables:
Identification, Estimation, and Policy Learning [5.372349090093469]
This work studies offline personalized pricing under endogeneity using an instrumental variable approach.
We propose a new policy learning method for Personalized pRicing using Invalid iNsTrumental variables.
arXiv Detail & Related papers (2023-02-24T14:50:47Z) - Off-policy evaluation for learning-to-rank via interpolating the
item-position model and the position-based model [83.83064559894989]
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production.
We develop a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings.
In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model.
arXiv Detail & Related papers (2022-10-15T17:22:30Z) - COptiDICE: Offline Constrained Reinforcement Learning via Stationary
Distribution Correction Estimation [73.17078343706909]
offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.
We present an offline constrained RL algorithm that optimize the policy in the space of the stationary distribution.
Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction.
arXiv Detail & Related papers (2022-04-19T15:55:47Z) - Convex Loss Functions for Contextual Pricing with Observational
Posted-Price Data [2.538209532048867]
We study an off-policy contextual pricing problem where the seller has access to samples of prices which customers were previously offered.
This is in contrast to the well-studied setting in which samples of the customer's valuation (willingness to pay) are observed.
In our setting, the observed data is influenced by the historic pricing policy, and we do not know how customers would have responded to alternative prices.
arXiv Detail & Related papers (2022-02-16T22:35:39Z) - Loss Functions for Discrete Contextual Pricing with Observational Data [8.661128420558349]
We study a pricing setting where each customer is offered a contextualized price based on customer and/or product features.
We observe whether each customer purchased a product at the price prescribed rather than the customer's true valuation.
arXiv Detail & Related papers (2021-11-18T20:12:57Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Confident Off-Policy Evaluation and Selection through Self-Normalized
Importance Weighting [15.985182419152197]
We propose a new method to compute a lower bound on the value of an arbitrary target policy.
The new approach is evaluated on a number of synthetic and real datasets and is found to be superior to its main competitors.
arXiv Detail & Related papers (2020-06-18T12:15:37Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.