Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
- URL: http://arxiv.org/abs/2508.16263v2
- Date: Sat, 20 Sep 2025 08:22:23 GMT
- Title: Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
- Authors: Mocheng Li, Xiao Yan, Baotong Lu, Yue Zhang, James Cheng, Chenhao Ma,
- Abstract summary: We present a unified Filtering ANN search interface that encompasses the latest algorithms and evaluate them extensively from multiple perspectives.<n>First, we propose a comprehensive taxonomy of existing Filtering ANN algorithms based on attribute types and filtering strategies.<n>We then conduct a broad experimental evaluation on 10 algorithms and 12 methods across 4 datasets.
- Score: 18.5007917065799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the growing integration of structured and unstructured data, new methods have emerged for performing similarity searches on vectors while honoring structured attribute constraints, i.e., a process known as Filtering Approximate Nearest Neighbor (Filtering ANN) search. Since many of these algorithms have only appeared in recent years and are designed to work with a variety of base indexing methods and filtering strategies, there is a pressing need for a unified analysis that identifies their core techniques and enables meaningful comparisons. In this work, we present a unified Filtering ANN search interface that encompasses the latest algorithms and evaluate them extensively from multiple perspectives. First, we propose a comprehensive taxonomy of existing Filtering ANN algorithms based on attribute types and filtering strategies. Next, we analyze their key components, i.e., index structures, pruning strategies, and entry point selection, to elucidate design differences and tradeoffs. We then conduct a broad experimental evaluation on 10 algorithms and 12 methods across 4 datasets (each with up to 10 million items), incorporating both synthetic and real attributes and covering selectivity levels from 0.1% to 100%. Finally, an in-depth component analysis reveals the influence of pruning, entry point selection, and edge filtering costs on overall performance. Based on our findings, we summarize the strengths and limitations of each approach, provide practical guidelines for selecting appropriate methods, and suggest promising directions for future research. Our code is available at: https://github.com/lmccccc/FANNBench.
Related papers
- Filtered Approximate Nearest Neighbor Search in Vector Databases: System Design and Performance Analysis [0.5249805590164902]
Filtered Approximate Nearest Neighbor Search (FANNS) is used to combine semantic retrieval with metadata constraints.<n>In this work, we systematize the taxonomy of filtering strategies and evaluate their integration into vectors.<n>Our experiments reveal that algorithmic adaptations within the engine often override raw index performance.
arXiv Detail & Related papers (2026-02-11T23:40:26Z) - JAG: Joint Attribute Graphs for Filtered Nearest Neighbor Search [8.398505388767248]
JAG (Joint Attribute Graphs) is a graph-based algorithm designed to deliver robust performance across the entire selectivity spectrum.<n>Our experimental results demonstrate that JAG significantly outperforms existing state-of-the-art baselines in both throughput and recall robustness.
arXiv Detail & Related papers (2026-02-10T20:00:51Z) - Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters [7.493563348124023]
Filtered Approximate Nearest Neighbor (ANN) search retrieves the closest vectors for a query vector from a dataset.<n>Existing graph-based methods typically incorporate filter awareness by assigning fixed penalties.<n>We propose a principled alternative that learns the optimal trade-off between vector distance and filter match directly from the data.
arXiv Detail & Related papers (2025-11-06T05:24:41Z) - Benchmarking Filtered Approximate Nearest Neighbor Search Algorithms on Transformer-based Embedding Vectors [18.796661826646616]
Filtered Approximate Nearest Neighbor Search is a problem known as Filtered Approximate Nearest Neighbor Search (FANNS)<n>We present a comprehensive survey and taxonomy of FANNS methods and analyze how they are benchmarked in the literature.<n>We introduce a novel dataset consisting of embedding vectors for the abstracts of over 2.7 million research articles from the arXiv repository, accompanied by 11 real-world attributes such as authors and categories.
arXiv Detail & Related papers (2025-07-29T16:39:54Z) - Billion-scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering [49.1574468325115]
This paper presents a novel approach for similarity search with complex filtering capabilities on billion-scale datasets, optimized for CPU inference.<n>Our method extends the classical IVF-Flat index structure to integrate multi-dimensional filters.<n>The proposed algorithm combines dense embeddings with discrete filtering attributes, enabling fast retrieval in high-dimensional spaces.
arXiv Detail & Related papers (2025-01-23T07:47:00Z) - A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs [2.188091591747149]
Whole-network filter pruning algorithms prune varying fractions of filters from each layer, hence providing greater flexibility.
This paper proposes a two-level hierarchical approach for whole-network filter pruning which is efficient and uses the classification loss as the final criterion.
Our method reduces the RAM requirement for ResNext101 from 7.6 GB to 1.5 GB and achieves a 94% reduction in FLOPS without losing accuracy on CIFAR-10.
arXiv Detail & Related papers (2024-08-22T03:59:57Z) - Beyond Discrete Selection: Continuous Embedding Space Optimization for
Generative Feature Selection [34.32619834917906]
We reformulate the feature selection problem as a deep differentiable optimization task.
We propose a new principled research perspective: conceptualizing discrete feature subsetting as continuous embedding space.
Specifically, we utilize reinforcement feature selection learning to generate diverse and high-quality training data.
arXiv Detail & Related papers (2023-02-26T03:18:45Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph.
Going to infinite allows to constrain the computational complexity of the selection process.
We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set
Similarity Under Skew [58.21885402826496]
All-pairs set similarity is a widely used data mining task, even for large and high-dimensional datasets.
We present a new distributed algorithm, LSF-Join, for approximate all-pairs set similarity.
We show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.
arXiv Detail & Related papers (2020-03-06T00:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.