Interpretable Personalization via Policy Learning with Linear Decision
Boundaries
- URL: http://arxiv.org/abs/2003.07545v4
- Date: Wed, 2 Nov 2022 22:02:12 GMT
- Title: Interpretable Personalization via Policy Learning with Linear Decision
Boundaries
- Authors: Zhaonan Qu, Isabella Qian, Zhengyuan Zhou
- Abstract summary: effective personalization of goods and services has become a core business for companies to improve revenues and maintain a competitive edge.
This paper studies the personalization problem through the lens of policy learning.
We propose a class of policies with linear decision boundaries and propose learning algorithms using tools from causal inference.
- Score: 14.817218449140338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of the digital economy and an explosion of available
information about consumers, effective personalization of goods and services
has become a core business focus for companies to improve revenues and maintain
a competitive edge. This paper studies the personalization problem through the
lens of policy learning, where the goal is to learn a decision-making rule (a
policy) that maps from consumer and product characteristics (features) to
recommendations (actions) in order to optimize outcomes (rewards). We focus on
using available historical data for offline learning with unknown data
collection procedures, where a key challenge is the non-random assignment of
recommendations. Moreover, in many business and medical applications,
interpretability of a policy is essential. We study the class of policies with
linear decision boundaries to ensure interpretability, and propose learning
algorithms using tools from causal inference to address unbalanced treatments.
We study several optimization schemes to solve the associated non-convex,
non-smooth optimization problem, and find that a Bayesian optimization
algorithm is effective. We test our algorithm with extensive simulation studies
and apply it to an anonymized online marketplace customer purchase dataset,
where the learned policy outputs a personalized discount recommendation based
on customer and product features in order to maximize gross merchandise value
(GMV) for sellers. Our learned policy improves upon the platform's baseline by
88.2\% in net sales revenue, while also providing informative insights on which
features are important for the decision-making process. Our findings suggest
that our proposed policy learning framework using tools from causal inference
and Bayesian optimization provides a promising practical approach to
interpretable personalization across a wide range of applications.
Related papers
- Learning Joint Models of Prediction and Optimization [56.04498536842065]
Predict-Then-Then framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by joint predictive models.
arXiv Detail & Related papers (2024-09-07T19:52:14Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Non-linear Welfare-Aware Strategic Learning [10.448052192725168]
This paper studies algorithmic decision-making in the presence of strategic individual behaviors.
We first generalize the agent best response model in previous works to the non-linear setting.
We show the three welfare can attain the optimum simultaneously only under restrictive conditions.
arXiv Detail & Related papers (2024-05-03T01:50:03Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - An explainable machine learning-based approach for analyzing customers'
online data to identify the importance of product attributes [0.6437284704257459]
We propose a game theory machine learning (ML) method that extracts comprehensive design implications for product development.
We apply our method to a real-world dataset of laptops from Kaggle, and derive design implications based on the results.
arXiv Detail & Related papers (2024-02-03T20:50:48Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - Optimizing Credit Limit Adjustments Under Adversarial Goals Using
Reinforcement Learning [42.303733194571905]
We seek to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques.
Our research establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment.
arXiv Detail & Related papers (2023-06-27T16:10:36Z) - PASTA: Pessimistic Assortment Optimization [25.51792135903357]
We consider a class of assortment optimization problems in an offline data-driven setting.
We propose an algorithm referred to as Pessimistic ASsortment opTimizAtion (PASTA) based on the principle of pessimism.
arXiv Detail & Related papers (2023-02-08T01:11:51Z) - Data-Driven Offline Decision-Making via Invariant Representation
Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction.
A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good.
In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.