Measuring Disparate Outcomes of Content Recommendation Algorithms with
Distributional Inequality Metrics
- URL: http://arxiv.org/abs/2202.01615v1
- Date: Thu, 3 Feb 2022 14:41:39 GMT
- Title: Measuring Disparate Outcomes of Content Recommendation Algorithms with
Distributional Inequality Metrics
- Authors: Tomo Lazovich, Luca Belli, Aaron Gonzales, Amanda Bower, Uthaipon
Tantipongpipat, Kristian Lum, Ferenc Huszar, Rumman Chowdhury
- Abstract summary: We evaluate a set of metrics originating from economics, distributional inequality metrics, and their ability to measure disparities in content exposure in the Twitter algorithmic timeline.
We show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users.
- Score: 5.74271110290378
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The harmful impacts of algorithmic decision systems have recently come into
focus, with many examples of systems such as machine learning (ML) models
amplifying existing societal biases. Most metrics attempting to quantify
disparities resulting from ML algorithms focus on differences between groups,
dividing users based on demographic identities and comparing model performance
or overall outcomes between these groups. However, in industry settings, such
information is often not available, and inferring these characteristics carries
its own risks and biases. Moreover, typical metrics that focus on a single
classifier's output ignore the complex network of systems that produce outcomes
in real-world settings. In this paper, we evaluate a set of metrics originating
from economics, distributional inequality metrics, and their ability to measure
disparities in content exposure in a production recommendation system, the
Twitter algorithmic timeline. We define desirable criteria for metrics to be
used in an operational setting, specifically by ML practitioners. We
characterize different types of engagement with content on Twitter using these
metrics, and use these results to evaluate the metrics with respect to the
desired criteria. We show that we can use these metrics to identify content
suggestion algorithms that contribute more strongly to skewed outcomes between
users. Overall, we conclude that these metrics can be useful tools for
understanding disparate outcomes in online social networks.
Related papers
- Machine Translation Meta Evaluation through Translation Accuracy
Challenge Sets [92.38654521870444]
We introduce ACES, a contrastive challenge set spanning 146 language pairs.
This dataset aims to discover whether metrics can identify 68 translation accuracy errors.
We conduct a large-scale study by benchmarking ACES on 50 metrics submitted to the WMT 2022 and 2023 metrics shared tasks.
arXiv Detail & Related papers (2024-01-29T17:17:42Z) - Truthful Meta-Explanations for Local Interpretability of Machine
Learning Models [10.342433824178825]
We present a local meta-explanation technique which builds on top of the truthfulness metric, which is a faithfulness-based metric.
We demonstrate the effectiveness of both the technique and the metric by concretely defining all the concepts and through experimentation.
arXiv Detail & Related papers (2022-12-07T08:32:04Z) - Analysis and Comparison of Classification Metrics [12.092755413404245]
Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk.
We show how to use these metrics to compute a system's calibration loss and compare this metric with the widely-used expected calibration error (ECE)
arXiv Detail & Related papers (2022-09-12T16:06:10Z) - Classification Performance Metric Elicitation and its Applications [5.5637552942511155]
Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications.
This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences.
arXiv Detail & Related papers (2022-08-19T03:57:17Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Estimation of Fair Ranking Metrics with Incomplete Judgments [70.37717864975387]
We propose a sampling strategy and estimation technique for four fair ranking metrics.
We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items.
arXiv Detail & Related papers (2021-08-11T10:57:00Z) - The Benchmark Lottery [114.43978017484893]
"A benchmark lottery" describes the overall fragility of the machine learning benchmarking process.
We show that the relative performance of algorithms may be altered significantly simply by choosing different benchmark tasks.
arXiv Detail & Related papers (2021-07-14T21:08:30Z) - Online Learning Demands in Max-min Fairness [91.37280766977923]
We describe mechanisms for the allocation of a scarce resource among multiple users in a way that is efficient, fair, and strategy-proof.
The mechanism is repeated for multiple rounds and a user's requirements can change on each round.
At the end of each round, users provide feedback about the allocation they received, enabling the mechanism to learn user preferences over time.
arXiv Detail & Related papers (2020-12-15T22:15:20Z) - Interpretable Assessment of Fairness During Model Evaluation [1.2183405753834562]
We introduce a novel hierarchical clustering algorithm to detect heterogeneity among users in given sets of sub-populations.
We demonstrate the performance of the algorithm on real data from LinkedIn.
arXiv Detail & Related papers (2020-10-26T02:31:17Z) - Fairness Metrics: A Comparative Analysis [1.7188280334580195]
We describe some of the most widely used fairness metrics using a common mathematical framework and present new results on the relationships among them.
Results presented herein can help place both specialists and non-specialists in a better position to identify the metric best suited for their application and goals.
arXiv Detail & Related papers (2020-01-22T03:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.