Pareto optimal proxy metrics
- URL: http://arxiv.org/abs/2307.01000v1
- Date: Mon, 3 Jul 2023 13:29:14 GMT
- Title: Pareto optimal proxy metrics
- Authors: Lee Richardson, Alessandro Zito, Dylan Greaves and Jacopo Soriano
- Abstract summary: We show that proxy metrics are eight times more sensitive than the north star and consistently moved in the same direction.
We apply our methodology to experiments from a large industrial recommendation system.
- Score: 62.997667081978825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: North star metrics and online experimentation play a central role in how
technology companies improve their products. In many practical settings,
however, evaluating experiments based on the north star metric directly can be
difficult. The two most significant issues are 1) low sensitivity of the north
star metric and 2) differences between the short-term and long-term impact on
the north star metric. A common solution is to rely on proxy metrics rather
than the north star in experiment evaluation and launch decisions. Existing
literature on proxy metrics concentrates mainly on the estimation of the
long-term impact from short-term experimental data. In this paper, instead, we
focus on the trade-off between the estimation of the long-term impact and the
sensitivity in the short term. In particular, we propose the Pareto optimal
proxy metrics method, which simultaneously optimizes prediction accuracy and
sensitivity. In addition, we give an efficient multi-objective optimization
algorithm that outperforms standard methods. We applied our methodology to
experiments from a large industrial recommendation system, and found proxy
metrics that are eight times more sensitive than the north star and
consistently moved in the same direction, increasing the velocity and the
quality of the decisions to launch new features.
Related papers
- Learning Metrics that Maximise Power for Accelerated A/B-Tests [13.528097424046823]
North Star metrics are typically delayed and insensitive.
Experiments need to run for a long time, and even then, type-II errors are prevalent.
We propose to tackle this by learning metrics from short-term signals.
arXiv Detail & Related papers (2024-02-06T11:31:04Z) - Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies [24.26653413077486]
Ten years ago a single metric, BLEU, governed progress in machine translation research.
This paper investigates the "dynamic range" of modern metrics.
arXiv Detail & Related papers (2024-01-12T18:47:40Z) - What is the Best Automated Metric for Text to Motion Generation? [19.71712698183703]
There is growing interest in generating skeleton-based human motions from natural language descriptions.
Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments.
This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better.
arXiv Detail & Related papers (2023-09-19T01:59:54Z) - Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure.
A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric.
We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z) - Vanishing Point Estimation in Uncalibrated Images with Prior Gravity
Direction [82.72686460985297]
We tackle the problem of estimating a Manhattan frame.
We derive two new 2-line solvers, one of which does not suffer from singularities affecting existing solvers.
We also design a new non-minimal method, running on an arbitrary number of lines, to boost the performance in local optimization.
arXiv Detail & Related papers (2023-08-21T13:03:25Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - A New Super-Resolution Measurement of Perceptual Quality and Fidelity [2.901173495131855]
Super-resolution results are usually measured by full-reference image quality metrics or human rating scores.
In this work, we analyze the evaluation problem based on the one-to-many mapping nature of super-resolution.
We show that the proposed metric is highly correlated with the human perceptual quality, and better than most existing metrics.
arXiv Detail & Related papers (2023-03-10T21:08:24Z) - Reenvisioning Collaborative Filtering vs Matrix Factorization [65.74881520196762]
Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years.
Announcement of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness.
We show the potential these techniques may have on beyond-accuracy evaluation while analyzing effect on complementary evaluation dimensions.
arXiv Detail & Related papers (2021-07-28T16:29:38Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.