Related papers: Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

URL: http://arxiv.org/abs/2407.07871v2
Date: Mon, 15 Jul 2024 14:23:35 GMT
Title: Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation
Authors: Wentao Xiao, Yueyang Zhan, Rui Xi, Mengshu Hou, Jianming Liao,
Abstract summary: graph-based indices become unacceptable when faced with a large number of real-time deletions, insertions, and updates. We present efficient measures to overcome the shortcomings of HNSW, specifically addressing poor performance over long periods of delete and update operations. Our proposed MN-RU algorithm effectively improves update efficiency and suppresses the growth rate of unreachable points, ensuring better overall performance and maintaining the integrity of the graph.
Score: 0.9592510017131104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small World). However, the performance of HNSW and most graph-based indices become unacceptable when faced with a large number of real-time deletions, insertions, and updates. Furthermore, during update operations, HNSW can result in some data points becoming unreachable, a situation we refer to as the `unreachable points phenomenon'. This phenomenon could significantly affect the search accuracy of the graph in certain situations. To address these issues, we present efficient measures to overcome the shortcomings of HNSW, specifically addressing poor performance over long periods of delete and update operations and resolving the issues caused by the unreachable points phenomenon. Our proposed MN-RU algorithm effectively improves update efficiency and suppresses the growth rate of unreachable points, ensuring better overall performance and maintaining the integrity of the graph. Our results demonstrate that our methods outperform existing approaches. Furthermore, since our methods are based on HNSW, they can be easily integrated with existing indices widely used in the industrial field, making them practical for future real-world applications. Code is available at \url{https://github.com/xwt1/MN-RU.git}

Related papers

In-Place Updates of a Graph Index for Streaming Approximate Nearest Neighbor Search [12.092920351505036]
IP-DiskANN is first algorithm to avoid batch consolidation by efficiently processing each insertion and deletion in-place. It has stable recall over various lengthy update patterns in both high-recall and low-recall regimes.
arXiv Detail & Related papers (2025-02-19T15:41:08Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. scaling them to large graphs is challenging due to the high computational and storage costs. We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
FIT-GNN: Faster Inference Time for GNNs Using Coarsening [1.323700980948722]
coarsening-based methods are used to reduce the graph into a smaller graph, resulting in faster computation. Prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during both training and inference phases.
arXiv Detail & Related papers (2024-10-19T06:27:24Z)
Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks [37.41604955004456]
Graph neural networks (GNNs) have demonstrated remarkable success in graph representation learning. Various sampling approaches have been proposed to scale GNNs to applications with large-scale graphs.
arXiv Detail & Related papers (2024-10-07T18:29:02Z)
CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion [8.036012885171166]
We introduce CANDY, a benchmark tailored for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion. CANDY comprehensively assesses a wide range of AKNN algorithms, integrating advanced optimizations such as machine learning-driven inference. Our evaluations across diverse datasets demonstrate that simpler AKNN baselines often surpass more complex alternatives in terms of recall and latency.
arXiv Detail & Related papers (2024-06-28T04:46:11Z)
OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework [21.87740178652843]
Causal discovery offers a promising approach to improve transparency and reliability. We propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects. We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms.
arXiv Detail & Related papers (2024-06-07T03:09:22Z)
Gradient Transformation: Towards Efficient and Model-Agnostic Unlearning for Dynamic Graph Neural Networks [66.70786325911124]
Graph unlearning has emerged as an essential tool for safeguarding user privacy and mitigating the negative impacts of undesirable data. With the increasing prevalence of DGNNs, it becomes imperative to investigate the implementation of dynamic graph unlearning. We propose an effective, efficient, model-agnostic, and post-processing method to implement DGNN unlearning.
arXiv Detail & Related papers (2024-05-23T10:26:18Z)
Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z)
PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce. We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD. Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z)
Less Can Be More: Unsupervised Graph Pruning for Large-scale Dynamic Graphs [42.864057751606616]
We propose and study the problem of unsupervised graph pruning on dynamic graphs. Our proposed STEP framework learns to remove potentially redundant edges from input dynamic graphs. Our results on three real-world datasets demonstrate the advantages on improving the efficacy, robustness, and efficiency of GNNs.
arXiv Detail & Related papers (2023-05-18T03:23:53Z)
Fast and Attributed Change Detection on Dynamic Graphs with Density of States [9.409281517596396]
Current solutions do not scale well to large real-world graphs, lack robustness to large amounts of node additions/deletions, and overlook changes in node attributes. We propose a novel spectral method: Scalable Change Point Detection (SCPD). SCPD generates an embedding for each graph snapshot by efficiently approximating the distribution of the Laplacian spectrum at each step. We show that SCPD achieves state-of-the art performance and can easily process millions of edges in a few CPU minutes.
arXiv Detail & Related papers (2023-05-15T16:02:36Z)
Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications. We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z)
Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
Towards an Efficient and General Framework of Robust Training for Graph Neural Networks [96.93500886136532]
Graph Neural Networks (GNNs) have made significant advances on several fundamental inference tasks. Despite GNNs' impressive performance, it has been observed that carefully crafted perturbations on graph structures lead them to make wrong predictions. We propose a general framework which leverages the greedy search algorithms and zeroth-order methods to obtain robust GNNs.
arXiv Detail & Related papers (2020-02-25T15:17:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.