Related papers: Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search

Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search

URL: http://arxiv.org/abs/2502.20695v1
Date: Fri, 28 Feb 2025 04:03:23 GMT
Title: Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search
Authors: Yang Shi, Yiping Sun, Jiaolong Du, Xiaocheng Zhong, Zhiyong Wang, Yao Hu,
Abstract summary: SOGAIC is a graph-based ANNS index construction system tailored for ultra-large-scale vector databases.<n>The proposed method has been successfully deployed in a real-world industrial search engine, managing over 10 billion daily updated vectors.
Score: 18.419278931226756
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Approximate Nearest Neighbor Search (ANNS) is essential for modern data-driven applications that require efficient retrieval of top-k results from massive vector databases. Although existing graph-based ANNS algorithms achieve a high recall rate on billion-scale datasets, their slow construction speed and limited scalability hinder their applicability to large-scale industrial scenarios. In this paper, we introduce SOGAIC, the first Scalable Overload-Aware Graph-Based ANNS Index Construction system tailored for ultra-large-scale vector databases: 1) We propose a dynamic data partitioning algorithm with overload constraints that adaptively introduces overlaps among subsets; 2) To enable efficient distributed subgraph construction, we employ a load-balancing task scheduling framework combined with an agglomerative merging strategy; 3) Extensive experiments on various datasets demonstrate a reduction of 47.3% in average construction time compared to existing methods. The proposed method has also been successfully deployed in a real-world industrial search engine, managing over 10 billion daily updated vectors and serving hundreds of millions of users.

Related papers

Efficient Knowledge Graph Construction and Retrieval from Unstructured Text for Large-Scale RAG Systems [0.06597195879147556]
We propose a scalable and cost-efficient framework for deploying Graph-based Retrieval Augmented Generation (GraphRAG) in enterprise environments.<n>These results validate the feasibility of deploying GraphRAG systems in real-world, large-scale enterprise applications.
arXiv Detail & Related papers (2025-07-04T00:05:55Z)
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search [58.98450205734779]
Large language model (LLM) agents have demonstrated strong capabilities across diverse domains.<n>Existing agent search methods suffer from three major limitations.<n>We introduce a comprehensive framework to address these challenges.
arXiv Detail & Related papers (2025-06-06T12:07:23Z)
HAKES: Scalable Vector Database for Embedding Search Service [16.034584281180006]
We build a vector database that achieves high throughput and high recall under concurrent read-write workloads.<n>Our index outperforms index baselines in the high recall region and under concurrent read-write workloads.<n>namesys is scalable and achieves up to $16times$ higher throughputs than the baselines.
arXiv Detail & Related papers (2025-05-18T19:26:29Z)
RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline. RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components. Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z)
LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search [4.194768796374315]
We propose a new supervised score computation method based on the observation that inner product approximation is a multi-output regression problem. Our experiments show that the proposed reduced-rank regression (RRR) method is superior to PQ in both query latency and memory usage. We also introduce LoRANN, a clustering-based ANN library that leverages the proposed score computation method.
arXiv Detail & Related papers (2024-10-24T17:13:39Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search [11.069814476661827]
Cross-modal ANNS aims to use the data vector from one modality to retrieve the most similar items from another. State-of-the-art ANNS approaches suffer poor performance for OOD workloads. We propose pRojected bipartite Graph (RoarGraph), an efficient ANNS graph index built under the guidance of query distribution.
arXiv Detail & Related papers (2024-08-16T06:48:16Z)
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation [20.409659920455955]
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets.<n>For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics.<n>Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates.
arXiv Detail & Related papers (2024-04-30T06:21:44Z)
Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization. We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric. Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z)
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow [0.0]
We build a generic, optimized, and fast dataflow solver, KAPLA, to explore the design space with effective validity check and efficiency estimation. KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference. It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.
arXiv Detail & Related papers (2023-06-09T03:12:42Z)
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking [124.21408098724551]
Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs) We present a new ensembling training manner, named EnGCN, to address the existing issues. Our proposed method has achieved new state-of-the-art (SOTA) performance on large-scale datasets.
arXiv Detail & Related papers (2022-10-14T03:43:05Z)
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures. We show Arch-Graph's transferability and high sample efficiency across numerous tasks. It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z)
Augmenting Novelty Search with a Surrogate Model to Engineer Meta-Diversity in Ensembles of Classifiers [5.8497361730688695]
Using Neuroevolution combined with Novelty Search to promote behavioural diversity is capable of constructing high-performing ensembles for classification. We propose a method to overcome this limitation by using a surrogate model which estimates the behavioural distance between two neural network architectures.
arXiv Detail & Related papers (2022-01-30T19:13:32Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.