Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data
- URL: http://arxiv.org/abs/2410.04568v1
- Date: Sun, 6 Oct 2024 17:53:44 GMT
- Title: Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data
- Authors: Ehsan Ebrahimzadeh, Nikhil Monga, Hang Gao, Alex Cozzi, Abraham Bagherjeiran,
- Abstract summary: We study the problem of learning a ranking policy for search or recommendation engines in a two-sided e-commerce marketplace.
As a value allocation mechanism, the ranking policy allocates retrieved items to the designated slots.
We build empirical estimates for the expected reward of the marketplace from observational data.
- Score: 8.985446716914515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a decision making framework to cast the problem of learning a ranking policy for search or recommendation engines in a two-sided e-commerce marketplace as an expected reward optimization problem using observational data. As a value allocation mechanism, the ranking policy allocates retrieved items to the designated slots so as to maximize the user utility from the slotted items, at any given stage of the shopping journey. The objective of this allocation can in turn be defined with respect to the underlying probabilistic user browsing model as the expected number of interaction events on presented items matching the user intent, given the ranking context. Through recognizing the effect of ranking as an intervention action to inform users' interactions with slotted items and the corresponding economic value of the interaction events for the marketplace, we formulate the expected reward of the marketplace as the collective value from all presented ranking actions. The key element in this formulation is a notion of context value distribution, which signifies not only the attribution of value to ranking interventions within a session but also the distribution of marketplace reward across user sessions. We build empirical estimates for the expected reward of the marketplace from observational data that account for the heterogeneity of economic value across session contexts as well as the distribution shifts in learning from observational user activity data. The ranking policy can then be trained by optimizing the empirical expected reward estimates via standard Bayesian inference techniques. We report empirical results for a product search ranking task in a major e-commerce platform demonstrating the fundamental trade-offs governed by ranking polices trained on empirical reward estimates with respect to extreme choices of the context value distribution.
Related papers
- Data Distribution Valuation [56.71023681599737]
Existing data valuation methods define a value for a discrete dataset.
In many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled.
We propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies.
arXiv Detail & Related papers (2024-10-06T07:56:53Z) - Uniting contrastive and generative learning for event sequences models [51.547576949425604]
This study investigates the integration of two self-supervised learning techniques - instance-wise contrastive learning and a generative approach based on restoring masked events in latent space.
Experiments conducted on several public datasets, focusing on sequence classification and next-event type prediction, show that the integrated method achieves superior performance compared to individual approaches.
arXiv Detail & Related papers (2024-08-19T13:47:17Z) - Maximizing the Success Probability of Policy Allocations in Online
Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests.
In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems.
We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - A Meta-learning based Stacked Regression Approach for Customer Lifetime
Value Prediction [3.6002910014361857]
Customer Lifetime Value (CLV) is the total monetary value of transactions/purchases made by a customer with the business over an intended period of time.
CLV finds application in a number of distinct business domains such as Banking, Insurance, Online-entertainment, Gaming, and E-Commerce.
We propose a system which is able to qualify both as effective, and comprehensive yet simple and interpretable.
arXiv Detail & Related papers (2023-08-07T14:22:02Z) - Click-Conversion Multi-Task Model with Position Bias Mitigation for
Sponsored Search in eCommerce [51.211924408864355]
We propose two position-bias-free prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE)
Experiments on the E-commerce sponsored product search dataset show that our proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction.
arXiv Detail & Related papers (2023-07-29T19:41:16Z) - The Role of Relevance in Fair Ranking [1.5469452301122177]
We argue that relevance scores should satisfy a set of desired criteria in order to guide fairness interventions.
We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data.
Our analyses and results surface the pressing need for new approaches to relevance collection and generation.
arXiv Detail & Related papers (2023-05-09T16:58:23Z) - Doubly Robust Off-Policy Evaluation for Ranking Policies under the
Cascade Behavior Model [11.101369123145588]
Off-policy evaluation for ranking policies enables performance estimation of new ranking policies using only logged data.
Previous studies introduce some assumptions on user behavior to make the item space tractable.
We propose the Cascade Doubly Robust estimator, which assumes that a user interacts with items sequentially from the top position in a ranking.
arXiv Detail & Related papers (2022-02-03T12:42:33Z) - Loss Functions for Discrete Contextual Pricing with Observational Data [8.661128420558349]
We study a pricing setting where each customer is offered a contextualized price based on customer and/or product features.
We observe whether each customer purchased a product at the price prescribed rather than the customer's true valuation.
arXiv Detail & Related papers (2021-11-18T20:12:57Z) - Fairness, Welfare, and Equity in Personalized Pricing [88.9134799076718]
We study the interplay of fairness, welfare, and equity considerations in personalized pricing based on customer features.
We show the potential benefits of personalized pricing in two settings: pricing subsidies for an elective vaccine, and the effects of personalized interest rates on downstream outcomes in microcredit.
arXiv Detail & Related papers (2020-12-21T01:01:56Z) - Combining Task Predictors via Enhancing Joint Predictability [53.46348489300652]
We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance.
Our algorithm jointly assesses the relevance of all references by adopting a Bayesian framework.
Based on experiments on seven real-world datasets from visual attribute ranking and multi-class classification scenarios, we demonstrate that our algorithm offers a significant performance gain and broadens the application range of existing predictor combination approaches.
arXiv Detail & Related papers (2020-07-15T21:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.