Detecting and Understanding Generalization Barriers for Neural Machine
Translation
- URL: http://arxiv.org/abs/2004.02181v1
- Date: Sun, 5 Apr 2020 12:33:51 GMT
- Title: Detecting and Understanding Generalization Barriers for Neural Machine
Translation
- Authors: Guanlin Li, Lemao Liu, Conghui Zhu, Tiejun Zhao, Shuming Shi
- Abstract summary: This paper attempts to identify and understand generalization barrier words within an unseen input sentence.
We propose a principled definition of generalization barrier words and a modified version which is tractable in computation.
We then conduct extensive analyses on those detected generalization barrier words on both Zh$Leftrightarrow$En NIST benchmarks.
- Score: 53.23463279153577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization to unseen instances is our eternal pursuit for all data-driven
models. However, for realistic task like machine translation, the traditional
approach measuring generalization in an average sense provides poor
understanding for the fine-grained generalization ability. As a remedy, this
paper attempts to identify and understand generalization barrier words within
an unseen input sentence that \textit{cause} the degradation of fine-grained
generalization. We propose a principled definition of generalization barrier
words and a modified version which is tractable in computation. Based on the
modified one, we propose three simple methods for barrier detection by the
search-aware risk estimation through counterfactual generation. We then conduct
extensive analyses on those detected generalization barrier words on both
Zh$\Leftrightarrow$En NIST benchmarks from various perspectives. Potential
usage of the detected barrier words is also discussed.
Related papers
- Evaluating Structural Generalization in Neural Machine Translation [13.880151307013318]
We construct SGET, a dataset covering various types of compositional generalization with control of words and sentence structures.
We show that neural machine translation models struggle more in structural generalization than in lexical generalization.
We also find different performance trends in semantic parsing and machine translation, which indicates the importance of evaluations across various tasks.
arXiv Detail & Related papers (2024-06-19T09:09:11Z) - GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples
using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks.
GIT combines the usage of gradient information and invariance transformations.
Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations.
The main idea is to enhance generalization by reducing sparsity and overfitting.
Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Out-of-domain Generalization from a Single Source: A Uncertainty
Quantification Approach [17.334457450818473]
We study a worst-case scenario in generalization: Out-of-domain generalization from a single source.
The goal is to learn a robust model from a single source and expect it to generalize over many unknown distributions.
We propose uncertainty-guided domain generalization to tackle the limitations.
arXiv Detail & Related papers (2021-08-05T23:53:55Z) - Information-Theoretic Bounds on the Moments of the Generalization Error
of Learning Algorithms [19.186110989897738]
Generalization error bounds are critical to understanding the performance of machine learning models.
We offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments.
arXiv Detail & Related papers (2021-02-03T11:38:00Z) - Representation Based Complexity Measures for Predicting Generalization
in Deep Learning [0.0]
Deep Neural Networks can generalize despite being significantly overparametrized.
Recent research has tried to examine this phenomenon from various view points.
We provide an interpretation of generalization from the perspective of quality of internal representations.
arXiv Detail & Related papers (2020-12-04T18:53:44Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z) - Stereotypical Bias Removal for Hate Speech Detection Task using
Knowledge-based Generalizations [16.304516254043865]
We study bias mitigation from unstructured text data for hate speech detection.
We propose novel methods leveraging knowledge-based generalizations for bias-free learning.
Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset and a Twitter dataset, show that the use of knowledge-based generalizations results in better performance.
arXiv Detail & Related papers (2020-01-15T18:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.