Related papers: Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

URL: http://arxiv.org/abs/2212.05250v2
Date: Mon, 25 Sep 2023 00:30:09 GMT
Title: Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics
Authors: Pengmiao Zhang, Rajgopal Kannan, Viktor K. Prasanna
Abstract summary: We propose MPGraph, an ML-based Prefetcher for Graph analytics using domain specific models. MPGraph three novel optimizations: soft detection for phase transitions, phase-specific multi-modality models for access andtemporal prefetching. Using CST, MPGraph achieves 12.52-21.23% IPC improvement, outperforming state-of-the-art non-ML prefetcher BO by 7.5-12.03% and ML-based prefetchers Voyager and TransFetch by 3.27-4.58%.
Score: 7.52191887022819
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Memory performance is a bottleneck in graph analytics acceleration. Existing Machine Learning (ML) prefetchers struggle with phase transitions and irregular memory accesses in graph processing. We propose MPGraph, an ML-based Prefetcher for Graph analytics using domain specific models. MPGraph introduces three novel optimizations: soft detection for phase transitions, phase-specific multi-modality models for access delta and page predictions, and chain spatio-temporal prefetching (CSTP) for prefetch control. Our transition detector achieves 34.17-82.15% higher precision compared with Kolmogorov-Smirnov Windowing and decision tree. Our predictors achieve 6.80-16.02% higher F1-score for delta and 11.68-15.41% higher accuracy-at-10 for page prediction compared with LSTM and vanilla attention models. Using CSTP, MPGraph achieves 12.52-21.23% IPC improvement, outperforming state-of-the-art non-ML prefetcher BO by 7.58-12.03% and ML-based prefetchers Voyager and TransFetch by 3.27-4.58%. For practical implementation, we demonstrate MPGraph using compressed models with reduced latency shows significantly superior accuracy and coverage compared with BO, leading to 3.58% higher IPC improvement.

Related papers

Hardware-Accelerated Event-Graph Neural Networks for Low-Latency Time-Series Classification on SoC FPGA [0.043533652831655174]
We present a hardware implementation of an event-graph neural network for time-series classification. We leverage an artificial cochlea model to convert the input time-series signals into a sparse event-data format. Our method achieves a floating-point accuracy of 92.7% on the SHD dataset for the base model, which is only 2.4% and 2% less than the state-of-the-art models.
arXiv Detail & Related papers (2025-03-09T14:08:46Z)
S*: Test Time Scaling for Code Generation [55.11863577956177]
We propose S*, the first hybrid test-time scaling framework for code generation. S* substantially improves the coverage and selection accuracy of generated code.
arXiv Detail & Related papers (2025-02-20T09:18:53Z)
PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners [65.93130697098658]
This paper proposes PredFormer, a pure transformer-based framework for predictive learning. With its recurrent-free, transformer-based design, PredFormer is both simple and efficient. experiments on synthetic and real-world datasets demonstrate that PredFormer achieves state-the-art performance.
arXiv Detail & Related papers (2024-10-07T03:52:06Z)
ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding [9.144813021145039]
This paper introduces ParFormer, a vision transformer that incorporates a Parallel Mixer and a Sparse Channel Attention Patch Embedding (SCAPE) ParFormer improves feature extraction by combining convolutional and attention mechanisms. For edge device deployment, ParFormer-T excels with a throughput of 278.1 images/sec, which is 1.38 $times$ higher than EdgeNeXt-S. The larger variant, ParFormer-L, reaches 83.5% Top-1 accuracy, offering a balanced trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-22T07:32:21Z)
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching [6.692695353937492]
We propose a new approach that significantly reduces model complexity and inference latency without sacrificing prediction accuracy. We develop DART, a prefetcher comprised of a simple hierarchy of tables. DART outperforms state-of-the-art NN-based prefetchers TransFetch by 33.1% and Voyager by 37.2% in terms of IPC improvement.
arXiv Detail & Related papers (2023-12-23T05:46:05Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z)
Fine-Grained Address Segmentation for Attention-Based Variable-Degree Prefetching [10.128730975303407]
We propose TransFetch, a novel way to model prefetching. To reduce vocabulary size, we use fine-grained address segmentation as input. To predict unordered sets of future addresses, we use delta bitmaps for multiple outputs.
arXiv Detail & Related papers (2022-05-01T05:30:37Z)
PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices [13.62426382827205]
PP-PicoDet family of real-time object detectors achieves superior performance on object detection for mobile devices. Models achieve better trade-offs between accuracy and latency compared to other popular models.
arXiv Detail & Related papers (2021-11-01T12:53:17Z)
Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction [92.16318571149553]
We propose a multiscale-temporal graph neural network (MST-GNN) to predict the future 3D-based skeleton human poses. The MST-GNN outperforms state-of-the-art methods in both short and long-term motion prediction.
arXiv Detail & Related papers (2021-08-25T14:05:37Z)
A contextual analysis of multi-layer perceptron models in classifying hand-written digits and letters: limited resources [0.0]
We extensively test an end-to-end vanilla neural network (MLP) approach in pure numpy without any pre-processing or feature extraction done beforehand. We show that basic data mining operations can significantly improve the performance of the models in terms of computational time.
arXiv Detail & Related papers (2021-07-05T04:30:37Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.