CSRv2: Unlocking Ultra-Sparse Embeddings
- URL: http://arxiv.org/abs/2602.05735v3
- Date: Tue, 10 Feb 2026 01:24:48 GMT
- Title: CSRv2: Unlocking Ultra-Sparse Embeddings
- Authors: Lixuan Guo, Yifei Wang, Tiansheng Wen, Yifan Wang, Aosong Feng, Bo Chen, Stefanie Jegelka, Chenyu You,
- Abstract summary: Contrastive Sparse Representation (CSR) is proposed as a promising direction to map dense embeddings into high-dimensional but k-sparse vectors.<n>CSR suffers severe degradation in the ultra-sparse regime, where over 80% of neurons remain inactive.<n>We introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable.
- Score: 52.553928856110296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact dense embeddings such as Matryoshka Representation Learning (MRL). Despite its promise, CSR suffers severe degradation in the ultra-sparse regime, where over 80% of neurons remain inactive, leaving much of its efficiency potential unrealized. In this paper, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. CSRv2 stabilizes sparsity learning through progressive k-annealing, enhances representational quality via supervised contrastive objectives, and ensures end-to-end adaptability with full backbone finetuning. CSRv2 reduces dead neurons from 80% to 20% and delivers a 14% accuracy gain at k=2, bringing ultra-sparse embeddings on par with CSR at k=8 and MRL at 32 dimensions, all with only two active features. While maintaining comparable performance, CSRv2 delivers a 7x speedup over MRL, and yields up to 300x improvements in compute and memory efficiency relative to dense embeddings in text representation. Extensive experiments across text and vision demonstrate that CSRv2 makes ultra-sparse embeddings practical without compromising performance, where CSRv2 achieves 7%/4% improvement over CSR when k=4 and further increases this gap to 14%/6% when k=2 in text/vision representation. By making extreme sparsity viable, CSRv2 broadens the design space for real-time and edge-deployable AI systems where both embedding quality and efficiency are critical.
Related papers
- Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention [105.11288339285154]
Efficient-LVSM is a dual-stream architecture that applies intra-view self-attention for input views and self-then-cross attention for target views.<n>It achieves 29.86 dB PSNR on RealEstate10K with 2 input views, surpassing LVSM by 0.2 dB, with 2x faster training convergence and 4.4x faster inference speed.
arXiv Detail & Related papers (2026-02-06T08:11:58Z) - SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning [59.013863248600046]
We propose a spatially re-focused super-resolution framework that refines text conditioning at inference time.<n>Second, we introduce a Spatially Targeted-Free Guidance mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations.
arXiv Detail & Related papers (2025-10-26T05:03:55Z) - KeyKnowledgeRAG (K^2RAG): An Enhanced RAG method for improved LLM question-answering capabilities [2.4874078867686085]
KeyKnowledgeRAG (K2RAG) is a novel framework designed to overcome limitations in RAG implementations.<n>It integrates dense and sparse vector search, knowledge graphs, and text summarization to improve retrieval quality and system efficiency.<n>K2RAG achieved the highest mean answer similarity score of 0.57, and reached the highest third quartile (Q3) similarity of 0.82, indicating better alignment with ground-truth answers.
arXiv Detail & Related papers (2025-07-10T12:19:03Z) - Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation [42.590255022001145]
Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths.<n>We show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity.
arXiv Detail & Related papers (2025-03-03T17:59:48Z) - NVS-SQA: Exploring Self-Supervised Quality Representation Learning for Neurally Synthesized Scenes without References [57.0432939964225]
We propose NVS-SQA, a quality assessment method to learn no-reference quality representations through self-supervision.<n>Traditional self-supervised learning predominantly relies on the "same instance, similar representation" assumption and extensive datasets.<n>We employ photorealistic cues and quality scores as learning objectives, along with a specialized contrastive pair preparation process to improve the effectiveness and efficiency of learning.
arXiv Detail & Related papers (2025-01-11T09:12:43Z) - VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction [30.201826592090885]
In-Context Operator Networks (ICONs) learn operators across diverse partial differential equations using few-shot, in-context learning.<n>Existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions.<n>We propose Vision In-Context Operator Networks (VICON), which integrates vision transformer architectures to efficiently process 2D data through patch-wise operations.
arXiv Detail & Related papers (2024-11-25T03:25:17Z) - Sebica: Lightweight Spatial and Efficient Bidirectional Channel Attention Super Resolution Network [0.0]
Single Image Super-Resolution (SISR) is a vital technique for improving the visual quality of low-resolution images.
We present Sebica, a lightweight network that incorporates spatial and efficient bidirectional channel attention mechanisms.
Sebica significantly reduces computational costs while maintaining high reconstruction quality.
arXiv Detail & Related papers (2024-10-27T18:27:07Z) - Structured Pruning for Efficient Visual Place Recognition [24.433604332415204]
Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices.
Our work introduces a novel structured pruning method to streamline common VPR architectures.
This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies.
arXiv Detail & Related papers (2024-09-12T08:32:25Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.