Related papers: Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models

Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models

URL: http://arxiv.org/abs/2104.06483v1
Date: Tue, 13 Apr 2021 19:51:21 GMT
Title: Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models
Authors: Ling Liu and Mans Hulden
Abstract summary: We show that, to be more effective, the hallucination process needs to pay attention to syllable-like length rather than individual characters or stems. We report a significant performance improvement with our hallucination model over previous data hallucination methods when training and test data do not overlap in their lemmata.
Score: 9.95909045828344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning sequence models have been successfully applied to the task of morphological inflection. The results of the SIGMORPHON shared tasks in the past several years indicate that such models can perform well, but only if the training data cover a good amount of different lemmata, or if the lemmata that are inflected at test time have also been seen in training, as has indeed been largely the case in these tasks. Surprisingly, standard models such as the Transformer almost completely fail at generalizing inflection patterns when asked to inflect previously unseen lemmata -- i.e. under "wug test"-like circumstances. While established data augmentation techniques can be employed to alleviate this shortcoming by introducing a copying bias through hallucinating synthetic new word forms using the alphabet in the language at hand, we show that, to be more effective, the hallucination process needs to pay attention to substrings of syllable-like length rather than individual characters or stems. We report a significant performance improvement with our substring-based hallucination model over previous data hallucination methods when training and test data do not overlap in their lemmata.

Related papers

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing [71.29488677105127]
Existing scene text recognition (STR) methods struggle to recognize challenging texts, especially for artistic and severely distorted characters. We propose a contrastive learning-based STR framework by leveraging synthetic and real unlabeled data without any human cost. Our method achieves SOTA performance (94.7% and 70.9% average accuracy on common benchmarks and Union14M-Benchmark.
arXiv Detail & Related papers (2024-11-23T15:24:47Z)
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data [4.636499986218049]
Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability. We propose an approach to improve the sample efficiency of these models by creating corrupted grounding data.
arXiv Detail & Related papers (2024-08-30T20:11:00Z)
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty [67.81977289444677]
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes.
arXiv Detail & Related papers (2024-07-08T16:13:42Z)
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training [57.771940716189114]
We show that large language models (LLMs) suffer from the "reversal curse" The root cause of the reversal curse lies in the different word order between the training and inference stage. We propose Semantic-aware Permutation Training (SPT) to address this issue.
arXiv Detail & Related papers (2024-03-01T18:55:20Z)
Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation [5.304395026626743]
Hallucination of text ungrounded in the input is a well-known problem in neural data-to-text generation. We propose a new way to mitigate hallucinations by combining the probabilistic output of a generator language model with the output of a special "text critic" Our method does not need any changes to the underlying LM's architecture or training procedure.
arXiv Detail & Related papers (2023-10-25T20:05:07Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step. We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z)
Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias. We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality. Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z)
Recoding latent sentence representations -- Dynamic gradient-based activation modification in RNNs [0.0]
In RNNs, encoding information in a suboptimal way can impact the quality of representations based on later elements in the sequence. I propose an augmentation to standard RNNs in form of a gradient-based correction mechanism. I conduct different experiments in the context of language modeling, where the impact of using such a mechanism is examined in detail.
arXiv Detail & Related papers (2021-01-03T17:54:17Z)
Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input) We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.