Embedding Hallucination for Few-Shot Language Fine-tuning
- URL: http://arxiv.org/abs/2205.01307v1
- Date: Tue, 3 May 2022 04:55:50 GMT
- Title: Embedding Hallucination for Few-Shot Language Fine-tuning
- Authors: Yiren Jian and Chongyang Gao and Soroush Vosoughi
- Abstract summary: We propose an Embedding Hallucination (EmbedHalluc) method, which generates auxiliary embedding-label pairs to expand the fine-tuning dataset.
Experiments demonstrate that our proposed method is effective in a wide range of language tasks, outperforming current fine-tuning methods.
- Score: 14.244787327283335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot language learners adapt knowledge from a pre-trained model to
recognize novel classes from a few-labeled sentences. In such settings,
fine-tuning a pre-trained language model can cause severe over-fitting. In this
paper, we propose an Embedding Hallucination (EmbedHalluc) method, which
generates auxiliary embedding-label pairs to expand the fine-tuning dataset.
The hallucinator is trained by playing an adversarial game with the
discriminator, such that the hallucinated embedding is indiscriminative to the
real ones in the fine-tuning dataset. By training with the extended dataset,
the language learner effectively learns from the diverse hallucinated
embeddings to overcome the over-fitting issue. Experiments demonstrate that our
proposed method is effective in a wide range of language tasks, outperforming
current fine-tuning methods. Further, we show that EmbedHalluc outperforms
other methods that address this over-fitting problem, such as common data
augmentation, semi-supervised pseudo-labeling, and regularization. The code
will be made available at: https://github.com/yiren-jian/EmbedHalluc.
Related papers
- Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data [4.636499986218049]
Multimodal language models can exhibit hallucinations in their outputs, which limits their reliability.
We propose an approach to improve the sample efficiency of these models by creating corrupted grounding data.
arXiv Detail & Related papers (2024-08-30T20:11:00Z) - Multilingual Fine-Grained News Headline Hallucination Detection [40.62136051552646]
We introduce the first multilingual, fine-grained news headline hallucination detection dataset.
This dataset contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts.
We propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance.
arXiv Detail & Related papers (2024-07-22T18:37:53Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation [124.9008419182485]
We present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy.
Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix.
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue.
arXiv Detail & Related papers (2023-11-29T18:57:07Z) - Zero-Resource Hallucination Prevention for Large Language Models [45.4155729393135]
"Hallucination" refers to instances where large language models (LLMs) generate factually inaccurate or ungrounded information.
We introduce a novel pre-language self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction.
We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-09-06T01:57:36Z) - Reducing Hallucinations in Neural Machine Translation with Feature
Attribution [54.46113444757899]
We present a case study focusing on model understanding and regularisation to reduce hallucinations in NMT.
We first use feature attribution methods to study the behaviour of an NMT model that produces hallucinations.
We then leverage these methods to propose a novel loss function that substantially helps reduce hallucinations and does not require retraining the model from scratch.
arXiv Detail & Related papers (2022-11-17T20:33:56Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - TAVAT: Token-Aware Virtual Adversarial Training for Language
Understanding [55.16953347580948]
Gradient-based adversarial training is widely used in improving the robustness of neural networks.
It cannot be easily adapted to natural language processing tasks since the embedding space is discrete.
We propose a Token-Aware Virtual Adrial Training method to craft fine-grained perturbations.
arXiv Detail & Related papers (2020-04-30T02:03:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.