An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2406.01549v2
- Date: Thu, 4 Jul 2024 14:21:39 GMT
- Title: An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation
- Authors: Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, Jingchang Chen, Bing Qin,
- Abstract summary: We introduce the information bottleneck theory into retrieval-augmented generation.
Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output.
We derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations.
- Score: 35.76451156732993
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottleneck theory into retrieval-augmented generation. Our approach involves the filtration of noise by simultaneously maximizing the mutual information between compression and ground output, while minimizing the mutual information between compression and retrieved passage. In addition, we derive the formula of information bottleneck to facilitate its application in novel comprehensive evaluations, the selection of supervised fine-tuning data, and the construction of reinforcement learning rewards. Experimental results demonstrate that our approach achieves significant improvements across various question answering datasets, not only in terms of the correctness of answer generation but also in the conciseness with $2.5\%$ compression rate.
Related papers
- Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise [9.492965765929963]
Noise in a dataset can significantly degrade the performance of Kolmogorov-Arnold networks.
We propose an oversampling technique combined with denoising to alleviate the impact of noise.
We conclude that applying both oversampling and filtering strategies can reduce the detrimental effects of noise.
arXiv Detail & Related papers (2024-07-20T14:17:10Z) - Entropy-Based Decoding for Retrieval-Augmented Large Language Models [43.93281157539377]
Augmenting Large Language Models with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses.
We introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue.
arXiv Detail & Related papers (2024-06-25T12:59:38Z) - BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering [58.403898834018285]
BlendFilter is a novel approach that elevates retrieval-augmented Large Language Models by integrating query generation blending with knowledge filtering.
We conduct extensive experiments on three open-domain question answering benchmarks, and the findings clearly indicate that our innovative BlendFilter surpasses state-of-the-art baselines significantly.
arXiv Detail & Related papers (2024-02-16T23:28:02Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Learning to Abstain From Uninformative Data [20.132146513548843]
We study the problem of learning and acting under a general noisy generative process.
In this problem, the data distribution has a significant proportion of uninformative samples with high noise in the label.
We propose a novel approach to learning under these conditions via a loss inspired by the selective learning theory.
arXiv Detail & Related papers (2023-09-25T15:55:55Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion [17.274784447811665]
We adopt the end-to-end framework of VITS for high-quality waveform reconstruction.
We disentangle content information by imposing an information bottleneck to WavLM features.
We propose the spectrogram-resize based data augmentation to improve the purity of extracted content information.
arXiv Detail & Related papers (2022-10-27T13:32:38Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - Self-supervised Sequential Information Bottleneck for Robust Exploration
in Deep Reinforcement Learning [28.75574762244266]
In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations.
For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty.
arXiv Detail & Related papers (2022-09-12T15:41:10Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.