Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models
- URL: http://arxiv.org/abs/2411.00686v1
- Date: Fri, 01 Nov 2024 15:47:05 GMT
- Title: Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models
- Authors: Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho,
- Abstract summary: LaPael is a latent-level paraphrasing method that applies input-dependent noise to early Large Language Models layers.
Our experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches.
- Score: 54.385486006684495
- License:
- Abstract: As Large Language Models (LLMs) are increasingly deployed in specialized domains with continuously evolving knowledge, the need for timely and precise knowledge injection has become essential. Fine-tuning with paraphrased data is a common approach to enhance knowledge injection, yet it faces two significant challenges: high computational costs due to repetitive external model usage and limited sample diversity. To this end, we introduce LaPael, a latent-level paraphrasing method that applies input-dependent noise to early LLM layers. This approach enables diverse and semantically consistent augmentations directly within the model. Furthermore, it eliminates the recurring costs of paraphrase generation for each knowledge update. Our extensive experiments on question-answering benchmarks demonstrate that LaPael improves knowledge injection over standard fine-tuning and existing noise-based approaches. Additionally, combining LaPael with data-level paraphrasing further enhances performance.
Related papers
- Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models [60.00178316095646]
Sentence embedding is essential for many NLP tasks, with contrastive learning methods achieving strong performance using datasets like NLI.
Recent studies leverage large language models (LLMs) to generate sentence pairs, reducing annotation dependency.
We propose a method for controlling the generation direction of LLMs in the latent space. Unlike unconstrained generation, the controlled approach ensures meaningful semantic divergence.
Experiments on multiple benchmarks demonstrate that our method achieves new SOTA performance with a modest cost in ranking sentence synthesis.
arXiv Detail & Related papers (2025-02-19T12:07:53Z) - Systematic Knowledge Injection into Large Language Models via Diverse Augmentation for Domain-Specific RAG [24.660769275714685]
Retrieval-Augmented Generation (RAG) has emerged as a prominent method for incorporating domain knowledge into Large Language Models (LLMs)
We present a novel framework that significantly enhances the fine-tuning process by augmenting the training data in two ways -- context augmentation and knowledge paraphrasing.
arXiv Detail & Related papers (2025-02-12T12:39:51Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Infusing Knowledge into Large Language Models with Contextual Prompts [5.865016596356753]
We propose a simple yet generalisable approach for knowledge infusion by generating prompts from the context in the input text.
Our experiments show the effectiveness of our approach which we evaluate by probing the fine-tuned LLMs.
arXiv Detail & Related papers (2024-03-03T11:19:26Z) - Augmenting LLMs with Knowledge: A survey on hallucination prevention [0.0]
This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources.
While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules.
arXiv Detail & Related papers (2023-09-28T14:09:58Z) - Advanced Conditional Variational Autoencoders (A-CVAE): Towards
interpreting open-domain conversation generation via disentangling latent
feature representation [15.742077523458995]
This paper proposes to harness the generative model with a priori knowledge through a cognitive approach involving mesoscopic scale feature disentanglement.
We propose a new metric for open-domain dialogues, which can objectively evaluate the interpretability of the latent space distribution.
arXiv Detail & Related papers (2022-07-26T07:39:36Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.