Related papers: Familiarity-aware Evidence Compression for Retrieval Augmented Generation

Familiarity-aware Evidence Compression for Retrieval Augmented Generation

URL: http://arxiv.org/abs/2409.12468v1
Date: Thu, 19 Sep 2024 05:14:55 GMT
Title: Familiarity-aware Evidence Compression for Retrieval Augmented Generation
Authors: Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen,
Abstract summary: We propose FaviComp, a training-free evidence compression technique that makes retrieved evidence more familiar to the target model. FaviComp proactively lowers the perplexity of the compressed evidence with regard to the target model. Experimental results demonstrate that FaviComp consistently outperforms existing baselines in multiple open-domain QA.
Score: 33.13513003367646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval Augmented Generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieval from external sources. However, it often struggles to filter out inconsistent and irrelevant information that can distract the LM from its tasks. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream task, potentially failing to utilize the evidence effectively. We propose FaviComp (Familiarity-aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Specifically, FaviComp proactively lowers the perplexity of the compressed evidence with regard to the target model by combining token probabilities from both the compression model and the target model to generate context that is more familiar to the target model. This approach balances the integration of parametric and non-parametric knowledge, which is especially helpful in complex tasks where the retrieved evidence set may not contain all the necessary information. Experimental results demonstrate that FaviComp consistently outperforms existing baselines in multiple open-domain QA datasets, achieving high compression rates and showcasing the effective integration of both parametric and non-parametric knowledge.

Related papers

ACoRN: Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models [26.585985828583304]
Abstractive compression utilizes smaller langauge models to condense query-relevant context. Retrieved documents often include information that is either irrelevant to answering the query or misleading due to factual incorrect content. This behavior indicates that abstractive compressors are more likely to omit important information essential for the correct answer.
arXiv Detail & Related papers (2025-04-17T06:05:35Z)
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning [23.376181947937788]
We propose task-aware key-value (KV) cache compression, which compresses external knowledge in a zero- or few-shot setup. Experiments show our approach outperforms both RAG and task-agnostic compression methods. A synthetic dataset highlights that RAG performs well when sparse evidence suffices, whereas task-aware compression is superior for broad knowledge tasks.
arXiv Detail & Related papers (2025-03-06T21:07:41Z)
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal [56.307484956135355]
CODiff is a compression-aware one-step diffusion model for JPEG artifact removal. We propose a dual learning strategy that combines explicit and implicit learning. Results demonstrate that CODiff surpasses recent leading methods in both quantitative and visual quality metrics.
arXiv Detail & Related papers (2025-02-14T02:46:27Z)
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression [91.23933111083389]
Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge. This paper presents BRIEF, a lightweight approach that performs query-aware multi-hop reasoning. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries.
arXiv Detail & Related papers (2024-10-20T04:24:16Z)
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting. We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z)
Accuracy is Not All You Need [9.371810162601623]
We conduct a detailed study of metrics across multiple compression techniques, models and datasets. We show that the behavior of compressed models as visible to end-users is significantly different from the baseline model, even when accuracy is similar. We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.
arXiv Detail & Related papers (2024-07-12T10:19:02Z)
Compression Represents Intelligence Linearly [14.651664954289354]
Large language models (LLMs) have been shown to be equivalent to compression. Despite such appealing discussions, little empirical evidence is present for the interplay between compression and intelligence. Across 12 benchmarks, our study brings together 31 public LLMs that originate from diverse organizations. Remarkably, we find that LLMs' intelligence almost linearly correlates with their ability to compress external text corpora.
arXiv Detail & Related papers (2024-04-15T17:03:41Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy. More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z)
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications [63.29358103217275]
Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. We propose two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after compression. We introduce a variant called Inference-time Dynamic Prompting (IDP) that can effectively increase prompt diversity without incurring any inference overhead.
arXiv Detail & Related papers (2023-10-02T03:12:06Z)
Benchmarking Adversarial Robustness of Compressed Deep Learning Models [15.737988622271219]
This study seeks to understand the effect of adversarial inputs crafted for base models on their pruned versions. Our findings reveal that while the benefits of pruning enhanced generalizability, compression, and faster inference times are preserved, adversarial robustness remains comparable to the base model.
arXiv Detail & Related papers (2023-08-16T06:06:56Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression [68.82486784654817]
We study two popular model compression techniques including knowledge distillation and pruning. We show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets. We develop a regularization strategy for model compression based on sample uncertainty.
arXiv Detail & Related papers (2021-10-16T00:20:04Z)
SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction. Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Self-Supervised GAN Compression [32.21713098893454]
We show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
arXiv Detail & Related papers (2020-07-03T04:18:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.