Concept than Document: Context Compression via AMR-based Conceptual Entropy
- URL: http://arxiv.org/abs/2511.18832v1
- Date: Mon, 24 Nov 2025 07:08:02 GMT
- Title: Concept than Document: Context Compression via AMR-based Conceptual Entropy
- Authors: Kaize Shi, Xueyao Sun, Xiaohui Tao, Lin Li, Qika Lin, Guandong Xu,
- Abstract summary: Large Language Models (LLMs) face information overload when handling long contexts, particularly in Retrieval-Augmented Generation (RAG)<n>We propose an unsupervised context compression framework that exploits Abstract Representation (AMR) graphs to preserve semantically essential information while filtering out irrelevant text.
- Score: 21.954536296551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) face information overload when handling long contexts, particularly in Retrieval-Augmented Generation (RAG) where extensive supporting documents often introduce redundant content. This issue not only weakens reasoning accuracy but also increases computational overhead. We propose an unsupervised context compression framework that exploits Abstract Meaning Representation (AMR) graphs to preserve semantically essential information while filtering out irrelevant text. By quantifying node-level entropy within AMR graphs, our method estimates the conceptual importance of each node, enabling the retention of core semantics. Specifically, we construct AMR graphs from raw contexts, compute the conceptual entropy of each node, and screen significant informative nodes to form a condensed and semantically focused context than raw documents. Experiments on the PopQA and EntityQuestions datasets show that our method outperforms vanilla and other baselines, achieving higher accuracy while substantially reducing context length. To the best of our knowledge, this is the first work introducing AMR-based conceptual entropy for context compression, demonstrating the potential of stable linguistic features in context engineering.
Related papers
- Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression [55.51959317490934]
Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding.<n>We argue that graphs inherently contain rich structural and semantic information, and that their effective exploitation can unlock potential gains in LLMs reasoning performance.<n>We propose Homophily-aware Structural and Semantic Compression for LLMs (HS2C), a framework centered on exploiting graph homophily.
arXiv Detail & Related papers (2026-01-13T03:35:18Z) - Enhancing Retrieval-Augmented Generation with Topic-Enriched Embeddings: A Hybrid Approach Integrating Traditional NLP Techniques [0.0]
This work proposes topic-enriched embeddings that integrate term-based signals and topic structure with contextual sentence embeddings.<n>By jointly capturing term-level and topic-level semantics, topic-enriched embeddings improve semantic clustering, increase retrieval precision, and reduce computational burden.
arXiv Detail & Related papers (2025-12-31T13:43:57Z) - RePo: Language Models with Context Re-Positioning [10.269249887819988]
In-context learning is fundamental to modern Large Language Models (LLMs)<n> prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices.<n>We propose RePo, a novel mechanism that reduces extraneous load via context re-positioning.
arXiv Detail & Related papers (2025-12-16T13:30:30Z) - ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering [54.72902502486611]
ReAG is a Reasoning-Augmented Multimodal RAG approach that combines coarse- and fine-grained retrieval with a critic model that filters irrelevant passages.<n>ReAG significantly outperforms prior methods, improving answer accuracy and providing interpretable reasoning grounded in retrieved evidence.
arXiv Detail & Related papers (2025-11-27T19:01:02Z) - Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering [49.43814054718318]
Multi-hop question answering (MHQA) requires integrating knowledge scattered across multiple passages to derive the correct answer.<n>Traditional retrieval-augmented generation (RAG) methods primarily focus on coarse-grained textual semantic similarity.<n>We propose a novel RAG approach called HGRAG for MHQA that achieves cross-granularity integration of structural and semantic information via hypergraphs.
arXiv Detail & Related papers (2025-08-15T06:36:13Z) - Context-Driven Knowledge Graph Completion with Semantic-Aware Relational Message Passing [7.335262932492395]
Semantic context surrounding a triplet $(h, r, t)$ is crucial for Knowledge Graph Completion (KGC)<n>Traditional node-based message passing mechanisms often introduce noise and suffer from information dilution or over-smoothing.<n>We propose a semantic-aware relational message passing framework.
arXiv Detail & Related papers (2025-06-29T08:37:48Z) - PICASO: Permutation-Invariant Context Composition with State Space Models [98.91198288025117]
State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states.<n>We propose a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens.<n>We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4x speedup.
arXiv Detail & Related papers (2025-02-24T19:48:00Z) - QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [75.81394991657545]
We introduce information bottleneck theory (IB) to model the problem.<n>We propose a cross-attention-based approach to approximate mutual information in IB.<n>Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z) - Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation [17.156915103545728]
Large Language Models (LLMs) have made significant strides in information acquisition.
Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge.
We propose a novel concept-based RAG framework with the Abstract Representation (AMR)-based concept distillation algorithm.
arXiv Detail & Related papers (2024-05-06T00:18:43Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Semantic Text Compression for Classification [17.259824817932294]
We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification.
We propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning.
arXiv Detail & Related papers (2023-09-19T17:50:57Z) - KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot
Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis.
We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics.
A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z) - Simple and Effective Relation-based Embedding Propagation for Knowledge
Representation Learning [15.881121633396832]
We propose the Relation-based Embedding Propagation (REP) method to adapt pretrained graph embeddings with context.
We show that REP brings about 10% relative improvement to triplet-based embedding methods on OGBL-WikiKG2.
It takes 5%-83% time to achieve comparable results as the state-of-the-art GC-OTE.
arXiv Detail & Related papers (2022-05-13T06:02:13Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.