Related papers: Individualized non-uniform quantization for vector search

Individualized non-uniform quantization for vector search

URL: http://arxiv.org/abs/2509.18471v1
Date: Mon, 22 Sep 2025 23:20:07 GMT
Title: Individualized non-uniform quantization for vector search
Authors: Mariano Tepper, Ted Willke,
Abstract summary: NVQ (non-uniform vector quantization) is a new vector compression technique that is computationally and spatially efficient.<n> NVQ exhibits improved accuracy compared to the state of the art with a minimal computational cost.
Score: 1.4896509623302838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Embedding vectors are widely used for representing unstructured data and searching through it for semantically similar items. However, the large size of these vectors, due to their high-dimensionality, creates problems for modern vector search techniques: retrieving large vectors from memory/storage is expensive and their footprint is costly. In this work, we present NVQ (non-uniform vector quantization), a new vector compression technique that is computationally and spatially efficient in the high-fidelity regime. The core in NVQ is to use novel parsimonious and computationally efficient nonlinearities for building non-uniform vector quantizers. Critically, these quantizers are \emph{individually} learned for each indexed vector. Our experimental results show that NVQ exhibits improved accuracy compared to the state of the art with a minimal computational cost.

Related papers

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression [57.54335545892155]
We introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook.<n>Our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines.
arXiv Detail & Related papers (2025-10-23T20:19:48Z)
Dimension reduction with structure-aware quantum circuits for hybrid machine learning [0.0]
Schmidt decomposition of a vector can be understood as writing the singular value decomposition (SVD) in vector form.<n>We show that quantum circuits designed on a value $k$ can approximate the reduced-form representations of entire datasets.
arXiv Detail & Related papers (2025-07-31T17:18:43Z)
Improving the Generation of VAEs with High Dimensional Latent Spaces by the use of Hyperspherical Coordinates [59.4526726541389]
Variational autoencoders (VAE) encode data into lower-dimensional latent vectors before decoding those vectors back to data.<n>We propose a new parameterization of the latent space with limited computational overhead.
arXiv Detail & Related papers (2025-07-21T05:10:43Z)
Linearithmic Clean-up for Vector-Symbolic Key-Value Memory with Kroneker Rotation Products [4.502446902578007]
A computational bottleneck in current Vector-Symbolic Architectures is the clean-up'' step.<n>We present a new codebook representation that supports efficient clean-up.<n>The resulting clean-up time complexity is linearithmic, i.e. $mathcalO(N,textlog,N)$.
arXiv Detail & Related papers (2025-06-18T18:23:28Z)
GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction [1.1599570446840546]
Cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. It is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. We present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search.
arXiv Detail & Related papers (2024-10-14T21:14:27Z)
On dimensionality of feature vectors in MPNNs [49.32130498861987]
We revisit the classical result of Morris et al.(AAAI'19) that message-passing graphs neural networks (MPNNs) are equal in their distinguishing power to the Weisfeiler--Leman (WL) isomorphism test.
arXiv Detail & Related papers (2024-02-06T12:56:55Z)
LeanVec: Searching vectors faster by making them fit [1.0863382547662974]
We present LeanVec, a framework that combines linear dimensionality reduction with vector quantization to accelerate similarity search on high-dimensional vectors. We show that LeanVec produces state-of-the-art results, with up to 3.7x improvement in search throughput and up to 4.9x faster index build time.
arXiv Detail & Related papers (2023-12-26T21:14:59Z)
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers. These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
SOLAR: Sparse Orthogonal Learned and Random Embeddings [45.920844071257754]
We argue that high-dimensional and ultra-sparse embedding is a significantly superior alternative to dense low-dimensional embedding for both query efficiency and accuracy. We train 500K dimensional SOLAR embeddings for the tasks of searching through 1.6M books and multi-label classification on the three largest public datasets. We achieve superior precision and recall compared to the respective state-of-the-art baselines for each of the tasks with up to 10 times faster speed.
arXiv Detail & Related papers (2020-08-30T17:35:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.