Conformalizing Machine Translation Evaluation
- URL: http://arxiv.org/abs/2306.06221v1
- Date: Fri, 9 Jun 2023 19:36:18 GMT
- Title: Conformalizing Machine Translation Evaluation
- Authors: Chrysoula Zerva, Andr\'e F. T. Martins
- Abstract summary: Several uncertainty estimation methods have been recently proposed for machine translation evaluation.
We show that the majority of them tend to underestimate model uncertainty, and as a result they often produce misleading confidence intervals that do not cover the ground truth.
We propose as an alternative the use of conformal prediction, a distribution-free method to obtain confidence intervals with a theoretically established guarantee on coverage.
- Score: 9.89901717499058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several uncertainty estimation methods have been recently proposed for
machine translation evaluation. While these methods can provide a useful
indication of when not to trust model predictions, we show in this paper that
the majority of them tend to underestimate model uncertainty, and as a result
they often produce misleading confidence intervals that do not cover the ground
truth. We propose as an alternative the use of conformal prediction, a
distribution-free method to obtain confidence intervals with a theoretically
established guarantee on coverage. First, we demonstrate that split conformal
prediction can ``correct'' the confidence intervals of previous methods to
yield a desired coverage level. Then, we highlight biases in estimated
confidence intervals, both in terms of the translation language pairs and the
quality of translations. We apply conditional conformal prediction techniques
to obtain calibration subsets for each data subgroup, leading to equalized
coverage.
Related papers
- Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.
Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.
We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - Conformalized Interval Arithmetic with Symmetric Calibration [9.559062601251464]
We develop conformal prediction intervals for single target to the prediction interval for sum of multiple targets.
We show that our method outperforms existing conformalized approaches as well as non-conformal approaches.
arXiv Detail & Related papers (2024-08-20T15:27:18Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Robust Conformal Prediction Using Privileged Information [17.886554223172517]
We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data.
Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption.
arXiv Detail & Related papers (2024-06-08T08:56:47Z) - Non-Exchangeable Conformal Language Generation with Nearest Neighbors [12.790082627386482]
Non-exchangeable conformal nucleus sampling is a novel extension of the conformal prediction framework to generation based on nearest neighbors.
Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees.
arXiv Detail & Related papers (2024-02-01T16:04:04Z) - Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making.
We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Evaluating Machine Translation Quality with Conformal Predictive
Distributions [0.0]
We present a new approach for assessing uncertainty in machine translation by simultaneously evaluating translation quality and providing a reliable confidence score.
Our method outperforms a simple, but effective baseline on six different language pairs in terms of coverage and sharpness.
arXiv Detail & Related papers (2023-06-02T13:56:30Z) - Post-selection Inference for Conformal Prediction: Trading off Coverage
for Precision [0.0]
Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level.
We develop simultaneous conformal inference to account for data-dependent miscoverage levels.
arXiv Detail & Related papers (2023-04-12T20:56:43Z) - Reliability-Aware Prediction via Uncertainty Learning for Person Image
Retrieval [51.83967175585896]
UAL aims at providing reliability-aware predictions by considering data uncertainty and model uncertainty simultaneously.
Data uncertainty captures the noise" inherent in the sample, while model uncertainty depicts the model's confidence in the sample's prediction.
arXiv Detail & Related papers (2022-10-24T17:53:20Z) - Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy.
We present a framework that treats these two desiderata jointly.
We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.