Can Vision-Language Models Handle Long-Context Code? An Empirical Study on Visual Compression
- URL: http://arxiv.org/abs/2602.00746v1
- Date: Sat, 31 Jan 2026 14:23:51 GMT
- Title: Can Vision-Language Models Handle Long-Context Code? An Empirical Study on Visual Compression
- Authors: Jianping Zhong, Guochang Li, Chen Zhi, Junxiao Han, Zhen Qin, Xinkui Zhao, Nan Wang, Shuiguang Deng, Jianwei Yin,
- Abstract summary: LongCodeOCR is a visual compression framework for Vision-Language Models (VLMs)<n>By preserving a global view, this approach avoids the dependency breakage inherent in filtering.<n>Our results demonstrate that visual code compression serves as a viable alternative for tasks requiring global understanding.
- Score: 36.83667074155589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) struggle with long-context code due to window limitations. Existing textual code compression methods mitigate this via selective filtering but often disrupt dependency closure, causing semantic fragmentation. To address this, we introduce LongCodeOCR, a visual compression framework that renders code into compressed two-dimensional image sequences for Vision-Language Models (VLMs). By preserving a global view, this approach avoids the dependency breakage inherent in filtering. We systematically evaluate LongCodeOCR against the state-of-the-art LongCodeZip across four benchmarks spanning code summarization, code question answering, and code completion. Our results demonstrate that visual code compression serves as a viable alternative for tasks requiring global understanding. At comparable compression ratios ($\sim$1.7$\times$), LongCodeOCR improves CompScore on Long Module Summarization by 36.85 points over LongCodeZip. At a 1M-token context length with Glyph (a specialized 9B VLM), LongCodeOCR maintains higher accuracy than LongCodeZip while operating at about 4$\times$ higher compression. Moreover, compared with LongCodeZip, LongCodeOCR drastically reduces compression-stage overhead (reducing latency from $\sim$4.3 hours to $\sim$1 minute at 1M tokens). Finally, our results characterize a fundamental coverage--fidelity trade-off: visual code compression retains broader context coverage to support global dependencies, yet faces fidelity bottlenecks on exactness-critical tasks; by contrast, textual code compression preserves symbol-level precision while sacrificing structural coverage.
Related papers
- Context Cascade Compression: Exploring the Upper Limits of Text Compression [3.013064618174921]
We introduce Context Cascade Compression C3 to explore the upper limits of text compression.<n>At a 20x compression ratio, our model achieves 98% decoding accuracy, compared to approximately 60% for DeepSeek-OCR.<n>This indicates that in the domain of context compression, C3 Compression demonstrates superior performance and feasibility over optical character compression.
arXiv Detail & Related papers (2025-11-19T09:02:56Z) - LLavaCode: Compressed Code Representations for Retrieval-Augmented Code Generation [8.868449925993994]
We introduce LlavaCode, a framework that compresses code into compact, semantically rich representations interpretable by code LLM.<n>Our experiments demonstrate that compressed context enables 20-38% reduction in Time-to-First-Token (TTFT) on line completion tasks.
arXiv Detail & Related papers (2025-10-22T14:49:21Z) - Glyph: Scaling Context Windows via Visual-Text Compression [91.20717058018745]
Glyph is a framework that renders long texts into images and processes them with vision-language models.<n>Our method achieves 3-4x token compression while maintaining accuracy comparable to leading long-context models.<n>Under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks.
arXiv Detail & Related papers (2025-10-20T17:58:56Z) - LongCodeZip: Compress Long Context for Code Language Models [16.940525379087326]
LongCodeZip is a novel plug-and-play code compression framework designed specifically for Large Language Models (LLMs)<n>By effectively reducing context size while preserving essential information, LongCodeZip enables LLMs to better scale to real-world, large-scale code scenarios.
arXiv Detail & Related papers (2025-10-01T02:54:57Z) - UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression [86.33995240043936]
UniGist is a sequence-level long-context compression framework for large language models.<n>It efficiently preserves context information by replacing raw tokens with special compression tokens (gists) in a fine-grained manner.<n>Our scheme also supports flexible inference by allowing the actual removal of compressed tokens, resulting in real-time memory savings.
arXiv Detail & Related papers (2025-09-19T08:47:37Z) - R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search [61.4807238517108]
Chain-of-Thought (CoT) reasoning enhances large language models (LLMs) by enabling step-by-step problem-solving.<n>CoT's extension to Long-CoT introduces substantial computational overhead due to increased token length.<n>We propose R1-Compress, a two-stage chunk-level compression framework that preserves both local information and coherence.
arXiv Detail & Related papers (2025-05-22T16:06:59Z) - Vision-centric Token Compression in Large Language Model [51.92055188780033]
Vision Centric Token Compression (Vist) is a slow-fast compression framework that mirrors human reading.<n>On eleven in-context learning benchmarks, Vist achieves the same accuracy with 2.3 times fewer tokens, cutting FLOPs by 16% and memory by 50%.
arXiv Detail & Related papers (2025-02-02T13:10:06Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.