Accelerating Large-Scale Graph-based Nearest Neighbor Search on a
Computational Storage Platform
- URL: http://arxiv.org/abs/2207.05241v1
- Date: Tue, 12 Jul 2022 00:42:18 GMT
- Title: Accelerating Large-Scale Graph-based Nearest Neighbor Search on a
Computational Storage Platform
- Authors: Ji-Hoon Kim, Yeo-Reum Park, Jaeyoung Do, Soo-Young Ji, and Joo-Young
Kim
- Abstract summary: We propose a computational storage platform that can accelerate a large-scale graph-based nearest neighbor search algorithm based on SmartSSD CSD.
As a result, the proposed computational storage platform achieves 75.59 query per second throughput for the SIFT1B dataset at 258.66W power dissipation.
- Score: 9.867170674550922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: K-nearest neighbor search is one of the fundamental tasks in various
applications and the hierarchical navigable small world (HNSW) has recently
drawn attention in large-scale cloud services, as it easily scales up the
database while offering fast search. On the other hand, a computational storage
device (CSD) that combines programmable logic and storage modules on a single
board becomes popular to address the data bandwidth bottleneck of modern
computing systems. In this paper, we propose a computational storage platform
that can accelerate a large-scale graph-based nearest neighbor search algorithm
based on SmartSSD CSD. To this end, we modify the algorithm more amenable on
the hardware and implement two types of accelerators using HLS- and RTL-based
methodology with various optimization methods. In addition, we scale up the
proposed platform to have 4 SmartSSDs and apply graph parallelism to boost the
system performance further. As a result, the proposed computational storage
platform achieves 75.59 query per second throughput for the SIFT1B dataset at
258.66W power dissipation, which is 12.83x and 17.91x faster and 10.43x and
24.33x more energy efficient than the conventional CPU-based and GPU-based
server platform, respectively. With multi-terabyte storage and custom
acceleration capability, we believe that the proposed computational storage
platform is a promising solution for cost-sensitive cloud datacenters.
Related papers
- Integrated Hardware Architecture and Device Placement Search [7.620610652090732]
Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy.
This is the first work to explore the co-optimization of determining the optimal architecture and device placement strategy.
Our approach achieves higher throughput on large language models compared to the state-of-the-art TPUv4 and the Spotlight accelerator search framework.
arXiv Detail & Related papers (2024-07-18T04:02:35Z) - Communication-Efficient Graph Neural Networks with Probabilistic
Neighborhood Expansion Analysis and Caching [59.8522166385372]
Training and inference with graph neural networks (GNNs) on massive graphs has been actively studied since the inception of GNNs.
This paper is concerned with minibatch training and inference with GNNs that employ node-wise sampling in distributed settings.
We present SALIENT++, which extends the prior state-of-the-art SALIENT system to work with partitioned feature data.
arXiv Detail & Related papers (2023-05-04T21:04:01Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Heterogeneous Data-Centric Architectures for Modern Data-Intensive
Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications.
In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z) - U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture
Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference.
We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization.
We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z) - High Performance Hyperspectral Image Classification using Graphics
Processing Units [0.0]
Real-time remote sensing applications require onboard real time processing capabilities.
Lightweight, small size and low power consumption hardware is essential for onboard real time processing systems.
arXiv Detail & Related papers (2021-05-30T09:26:03Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Multi-scale Interaction for Real-time LiDAR Data Segmentation on an
Embedded Platform [62.91011959772665]
Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles.
Current approaches that operate directly on the point cloud use complex spatial aggregation operations.
We propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate.
arXiv Detail & Related papers (2020-08-20T19:06:11Z) - HeAT -- a Distributed and GPU-accelerated Tensor Framework for Data
Analytics [0.0]
HeAT is an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API.
HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI.
When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude.
arXiv Detail & Related papers (2020-07-27T13:33:17Z) - Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core
Platforms With Network-on-Chip Interconnect [0.0764671395172401]
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years.
Many-core platforms consisting of several homogeneous cores can alleviate limitations with regard to physical implementation at the expense of an increased dataflow mapping effort.
This work presents an automated mapping strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses.
The strategy is then extended towards a suitable many-core mapping scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect.
arXiv Detail & Related papers (2020-06-18T17:13:18Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.