Related papers: Pareto optimal proxy metrics

Pareto optimal proxy metrics

URL: http://arxiv.org/abs/2307.01000v1
Date: Mon, 3 Jul 2023 13:29:14 GMT
Title: Pareto optimal proxy metrics
Authors: Lee Richardson, Alessandro Zito, Dylan Greaves and Jacopo Soriano
Abstract summary: We show that proxy metrics are eight times more sensitive than the north star and consistently moved in the same direction. We apply our methodology to experiments from a large industrial recommendation system.
Score: 62.997667081978825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: North star metrics and online experimentation play a central role in how technology companies improve their products. In many practical settings, however, evaluating experiments based on the north star metric directly can be difficult. The two most significant issues are 1) low sensitivity of the north star metric and 2) differences between the short-term and long-term impact on the north star metric. A common solution is to rely on proxy metrics rather than the north star in experiment evaluation and launch decisions. Existing literature on proxy metrics concentrates mainly on the estimation of the long-term impact from short-term experimental data. In this paper, instead, we focus on the trade-off between the estimation of the long-term impact and the sensitivity in the short term. In particular, we propose the Pareto optimal proxy metrics method, which simultaneously optimizes prediction accuracy and sensitivity. In addition, we give an efficient multi-objective optimization algorithm that outperforms standard methods. We applied our methodology to experiments from a large industrial recommendation system, and found proxy metrics that are eight times more sensitive than the north star and consistently moved in the same direction, increasing the velocity and the quality of the decisions to launch new features.

Related papers

Metric Learning for Tag Recommendation: Tackling Data Sparsity and Cold Start Issues [4.315795907799471]
This paper proposes a new label recommendation algorithm based on metric learning. It aims to overcome the challenges of traditional recommendation systems by learning effective distance or similarity metrics. It performs particularly well in the accuracy of the first few recommended items.
arXiv Detail & Related papers (2024-11-10T06:46:44Z)
Are We Really Achieving Better Beyond-Accuracy Performance in Next Basket Recommendation? [57.91114305844153]
Next basket recommendation (NBR) is a special type of sequential recommendation that is increasingly receiving attention. Recent studies into NBR have found a substantial performance difference between recommending repeat items and explore items. We propose a plug-and-play two-step repetition-exploration framework that treats repeat items and explores items separately.
arXiv Detail & Related papers (2024-05-02T09:59:35Z)
Learning Metrics that Maximise Power for Accelerated A/B-Tests [13.528097424046823]
North Star metrics are typically delayed and insensitive. Experiments need to run for a long time, and even then, type-II errors are prevalent. We propose to tackle this by learning metrics from short-term signals.
arXiv Detail & Related papers (2024-02-06T11:31:04Z)
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies [24.26653413077486]
Ten years ago a single metric, BLEU, governed progress in machine translation research. This paper investigates the "dynamic range" of modern metrics.
arXiv Detail & Related papers (2024-01-12T18:47:40Z)
Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z)
Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction [82.72686460985297]
We tackle the problem of estimating a Manhattan frame. We derive two new 2-line solvers, one of which does not suffer from singularities affecting existing solvers. We also design a new non-minimal method, running on an arbitrary number of lines, to boost the performance in local optimization.
arXiv Detail & Related papers (2023-08-21T13:03:25Z)
A New Super-Resolution Measurement of Perceptual Quality and Fidelity [2.901173495131855]
Super-resolution results are usually measured by full-reference image quality metrics or human rating scores. In this work, we analyze the evaluation problem based on the one-to-many mapping nature of super-resolution. We show that the proposed metric is highly correlated with the human perceptual quality, and better than most existing metrics.
arXiv Detail & Related papers (2023-03-10T21:08:24Z)
Reenvisioning Collaborative Filtering vs Matrix Factorization [65.74881520196762]
Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years. Announcement of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness. We show the potential these techniques may have on beyond-accuracy evaluation while analyzing effect on complementary evaluation dimensions.
arXiv Detail & Related papers (2021-07-28T16:29:38Z)
SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE) Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline. Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.