FuguReport

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Authors Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You
Affiliations Yale University / Stony Brook University / Xidian University / Massachusetts Institute of Technology / Technical University of Munich / Georgia Institute of Technology / Amazon AGI SF Lab
Categories Method / Sparse Coding / Replacing clustering with sparse coding, Application / Multi-vector Retrieval / Efficient multi-vector search, Evaluation / Efficiency Evaluation / Indexing speed and search latency improvements
License CC BY 4.0

Abstract Overview

This paper proposes Single-stage Sparse Retrieval (SSR), a multi-vector retrieval framework that replaces clustering-based dense indexing with sparse coding via sparse autoencoders. Instead of compressing token embeddings into low-dimensional dense vectors, SSR maps them into high-dimensional but highly sparse representations, allowing retrieval through neuron-level inverted indexes and sparse late interaction scoring. The method includes token-only and token-plus-[CLS] variants, as well as an accelerated SSR++ pipeline that uses coarse-to-fine pruning to reduce latency. Experiments on MS MARCO, BEIR, LoTTE, long-document ranking, and LLM-based backbones evaluate both retrieval effectiveness and system efficiency.

Novelty

The main novelty is a shift in multi-vector retrieval from density-based approximation with K-means clustering to single-stage sparse coding with inverted indexing. The paper also combines sparse autoencoding with retrieval-oriented contrastive objectives so that the sparse representations remain both reconstructive and discriminative for ranking.

Results

On the controlled BEIR evaluation, SSR-CLS achieves the best average nDCG@10 of 53.4, exceeding Splade-v3 (51.2) and PLAID (49.3), while SSR-tok reaches 17.5 ms retrieval latency and still outperforms the compared baselines in average effectiveness. The indexing pipeline is reported to be over 15x faster than ColBERTv2, and SSR shows strong robustness across settings including 9 of 13 BEIR datasets, LoTTE long-tail retrieval, long-document ranking, and Llama-embed-8B backbones.

Key Points

  1. SSR replaces K-means-based clustering in multi-vector retrieval with sparse autoencoder projections and neuron-level inverted indexing.
  2. The method improves the effectiveness-efficiency trade-off, reporting sub-20 ms retrieval and more than 15x faster indexing than clustering-based dense MVR systems.
  3. The empirical study covers standard benchmarks, long-tail and long-document settings, and frozen LLM backbones, suggesting the approach generalizes beyond a narrow controlled setup.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.