Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors
- URL: http://arxiv.org/abs/2302.08975v1
- Date: Fri, 17 Feb 2023 16:20:33 GMT
- Title: Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors
- Authors: Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan
He, Derek F.Wong, Jun Xie
- Abstract summary: Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
- Score: 80.22825549235556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained information on translation errors is helpful for the translation
evaluation community. Existing approaches can not synchronously consider error
position and type, failing to integrate the error information of both. In this
paper, we propose Fine-Grained Translation Error Detection (FG-TED) task,
aiming at identifying both the position and the type of translation errors on
given source-hypothesis sentence pairs. Besides, we build an FG-TED model to
predict the \textbf{addition} and \textbf{omission} errors -- two typical
translation accuracy errors. First, we use a word-level classification paradigm
to form our model and use the shortcut learning reduction to relieve the
influence of monolingual features. Besides, we construct synthetic datasets for
model training, and relieve the disagreement of data labeling in authoritative
datasets, making the experimental benchmark concordant. Experiments show that
our model can identify both error type and position concurrently, and gives
state-of-the-art results on the restored dataset. Our model also delivers more
reliable predictions on low-resource and transfer scenarios than existing
baselines. The related datasets and the source code will be released in the
future.
Related papers
- Context-Aware Machine Translation with Source Coreference Explanation [26.336947440529713]
We propose a model that explains the decisions made for translation by predicting coreference features in the input.
We evaluate our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset.
arXiv Detail & Related papers (2024-04-30T12:41:00Z) - xCOMET: Transparent Machine Translation Evaluation through Fine-grained
Error Detection [21.116517555282314]
xCOMET is an open-source learned metric designed to bridge the gap between machine translation evaluation approaches.
It integrates both sentence-level evaluation and error span detection capabilities, exhibiting state-of-the-art performance across all types of evaluation.
We also provide a robustness analysis with stress tests, and show that xCOMET is largely capable of identifying localized critical errors and hallucinations.
arXiv Detail & Related papers (2023-10-16T15:03:14Z) - Annotating and Detecting Fine-grained Factual Errors for Dialogue
Summarization [34.85353544844499]
We present the first dataset with fine-grained factual error annotations named DIASUMFACT.
We define fine-grained factual error detection as a sentence-level multi-label classification problem.
We propose an unsupervised model ENDERANKER via candidate ranking using pretrained encoder-decoder models.
arXiv Detail & Related papers (2023-05-26T00:18:33Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese
Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction.
Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected.
Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - Correcting the Autocorrect: Context-Aware Typographical Error Correction
via Training Data Augmentation [38.10429793534442]
We first draw on a small set of annotated data to compute spelling error statistics.
These are then invoked to introduce errors into substantially larger corpora.
We use it to create a set of English language error detection and correction datasets.
arXiv Detail & Related papers (2020-05-03T18:08:17Z) - Towards Minimal Supervision BERT-based Grammar Error Correction [81.90356787324481]
We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios.
Results show strong potential of Bidirectional Representations from Transformers (BERT) in grammatical error correction task.
arXiv Detail & Related papers (2020-01-10T15:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.