On Measuring Context Utilization in Document-Level MT Systems
- URL: http://arxiv.org/abs/2402.01404v1
- Date: Fri, 2 Feb 2024 13:37:07 GMT
- Title: On Measuring Context Utilization in Document-Level MT Systems
- Authors: Wafaa Mohammed, Vlad Niculae
- Abstract summary: We propose to complement accuracy-based evaluation with measures of context utilization.
We show that automatically-annotated supporting context gives similar conclusions to human-annotated context.
- Score: 12.02023514105999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document-level translation models are usually evaluated using general metrics
such as BLEU, which are not informative about the benefits of context. Current
work on context-aware evaluation, such as contrastive methods, only measure
translation accuracy on words that need context for disambiguation. Such
measures cannot reveal whether the translation model uses the correct
supporting context. We propose to complement accuracy-based evaluation with
measures of context utilization. We find that perturbation-based analysis
(comparing models' performance when provided with correct versus random
context) is an effective measure of overall context utilization. For a
finer-grained phenomenon-specific evaluation, we propose to measure how much
the supporting context contributes to handling context-dependent discourse
phenomena. We show that automatically-annotated supporting context gives
similar conclusions to human-annotated context and can be used as alternative
for cases where human annotations are not available. Finally, we highlight the
importance of using discourse-rich datasets when assessing context utilization.
Related papers
- Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics [22.041561519672456]
Large language models (LLMs) often produce unsupported or unverifiable content, known as "hallucinations"
We propose a comparative evaluation framework that assesses the metric effectiveness in distinguishing citations between three-category support levels.
Our results show no single metric consistently excels across all evaluations, revealing the complexity of assessing fine-grained support.
arXiv Detail & Related papers (2024-06-21T15:57:24Z) - Fine-grained Controllable Text Generation through In-context Learning with Feedback [57.396980277089135]
We present a method for rewriting an input sentence to match specific values of nontrivial linguistic features, such as dependency depth.
In contrast to earlier work, our method uses in-context learning rather than finetuning, making it applicable in use cases where data is sparse.
arXiv Detail & Related papers (2024-06-17T08:55:48Z) - Quantifying the Plausibility of Context Reliance in Neural Machine
Translation [25.29330352252055]
We introduce Plausibility Evaluation of Context Reliance (PECoRe)
PECoRe is an end-to-end interpretability framework designed to quantify context usage in language models' generations.
We use pecore to quantify the plausibility of context-aware machine translation models.
arXiv Detail & Related papers (2023-10-02T13:26:43Z) - On the Intrinsic and Extrinsic Fairness Evaluation Metrics for
Contextualized Language Representations [74.70957445600936]
Multiple metrics have been introduced to measure fairness in various natural language processing tasks.
These metrics can be roughly categorized into two categories: 1) emphextrinsic metrics for evaluating fairness in downstream applications and 2) emphintrinsic metrics for estimating fairness in upstream language representation models.
arXiv Detail & Related papers (2022-03-25T22:17:43Z) - Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets [18.754562380068815]
State-of-the-art contextualized models such as BERT use tasks such as WiC and WSD to evaluate their word-in-context representations.
This study presents the first quantitative analysis (using probing baselines) on the context-word interaction being tested in major contextual lexical semantic tasks.
arXiv Detail & Related papers (2021-12-13T15:37:05Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Measuring and Increasing Context Usage in Context-Aware Machine
Translation [64.5726087590283]
We introduce a new metric, conditional cross-mutual information, to quantify the usage of context by machine translation models.
We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models.
arXiv Detail & Related papers (2021-05-07T19:55:35Z) - On the Use of Context for Predicting Citation Worthiness of Sentences in
Scholarly Articles [10.28696219236292]
We formulate this problem as a sequence labeling task solved using a hierarchical BiLSTM model.
We contribute a new benchmark dataset containing over two million sentences and their corresponding labels.
Our results quantify the benefits of using context and contextual embeddings for citation worthiness.
arXiv Detail & Related papers (2021-04-18T21:47:30Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z) - Don't Judge an Object by Its Context: Learning to Overcome Contextual
Bias [113.44471186752018]
Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy.
This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations.
arXiv Detail & Related papers (2020-01-09T18:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.