The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation
- URL: http://arxiv.org/abs/2503.03308v1
- Date: Wed, 05 Mar 2025 09:41:03 GMT
- Title: The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation
- Authors: Jie He, Tao Wang, Deyi Xiong, Qun Liu,
- Abstract summary: We present a test suite to evaluate the commonsense reasoning capability of neural machine translation.<n>We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations.<n>Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types.
- Score: 59.06696045219381
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Does neural machine translation yield translations that are congenial with common sense? In this paper, we present a test suite to evaluate the commonsense reasoning capability of neural machine translation. The test suite consists of three test sets, covering lexical and contextless/contextual syntactic ambiguity that requires commonsense knowledge to resolve. We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations, involving 7 different common sense types. Language models pretrained on large-scale corpora, such as BERT, GPT-2, achieve a commonsense reasoning accuracy of lower than 72% on target translations of this test suite. We conduct extensive experiments on the test suite to evaluate commonsense reasoning in neural machine translation and investigate factors that have impact on this capability. Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types in terms of both reasoning accuracy (60.1%) and reasoning consistency (31%). The built commonsense test suite is available at https://github.com/tjunlp-lab/CommonMT.
Related papers
- UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Automatic Evaluation and Analysis of Idioms in Neural Machine
Translation [12.227312923011986]
We present a novel metric for measuring the frequency of literal translation errors without human involvement.
We explore the role of monolingual pretraining and find that it yields substantial targeted improvements.
We find that the randomly idiom models are more local or "myopic" as they are relatively unaffected by variations of the context.
arXiv Detail & Related papers (2022-10-10T10:30:09Z) - Comparing Formulaic Language in Human and Machine Translation: Insight
from a Parliamentary Corpus [0.0]
The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator.
The results confirm the observations on the news corpus, but the differences are less strong.
They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations.
arXiv Detail & Related papers (2022-06-22T08:59:10Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - COM2SENSE: A Commonsense Reasoning Benchmark with Complementary
Sentences [21.11065466376105]
Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI)
Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets.
We introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements.
arXiv Detail & Related papers (2021-06-02T06:31:55Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Personality Assessment from Text for Machine Commonsense Reasoning [15.348792748868643]
PerSense is a framework to estimate human personality traits based on expressed texts.
Our goal is to demonstrate the feasibility of using machine learning algorithms on personality trait data.
arXiv Detail & Related papers (2020-04-15T07:30:47Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - On the Integration of LinguisticFeatures into Statistical and Neural
Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate.
We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations.
We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.