GPU-Accelerated Loopy Belief Propagation for Program Analysis
- URL: http://arxiv.org/abs/2509.22337v1
- Date: Fri, 26 Sep 2025 13:30:30 GMT
- Title: GPU-Accelerated Loopy Belief Propagation for Program Analysis
- Authors: Haoyu Feng, Xin Zhang,
- Abstract summary: This paper presents a GPU-accelerated LBP algorithm for program analysis.<n>We propose a unified representation for specifying arbitrary user-defined update strategies, along with a dependency analysis algorithm.<n>Our approach achieves an average speedup of $2.14times$ over the state-of-the-art sequential approach and $5.56times$ over the state-of-the-art GPU-based approach.
- Score: 3.516434517865342
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Loopy Belief Propagation (LBP) is a widely used approximate inference algorithm in probabilistic graphical models, with applications in computer vision, error correction codes, protein folding, program analysis, etc. However, LBP faces significant computational challenges when applied to large-scale program analysis. While GPU (Graphics Processing Unit) parallel computing provides a promising solution, existing approaches lack support for flexible update strategies and have yet to integrate logical constraints with GPU acceleration, leading to suboptimal practical performance. This paper presents a GPU-accelerated LBP algorithm for program analysis. To support the diverse update strategies required by users, we propose a unified representation for specifying arbitrary user-defined update strategies, along with a dependency analysis algorithm. Furthermore, building on previous work that leverages the local structure of Horn clauses to simplify message passing, we group messages to minimize warp divergence and better utilize GPU resources. Experimental results on datarace analysis over eight real-world Java programs show that our approach achieves an average speedup of $2.14\times$ over the state-of-the-art sequential approach and $5.56\times$ over the state-of-the-art GPU-based approach, while maintaining high accuracy.
Related papers
- GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z) - Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver [0.688204255655161]
Existing approaches either fully or use plugin-based GPU solvers, each facing trade-offs between performance and development effort.<n>We propose a repartitioning strategy that better balances CPU matrix assembly and GPU-based linear solves.<n>Our results show that the proposed method significantly mitigates oversubscription issues, improving solver performance and resource utilization.
arXiv Detail & Related papers (2025-10-09T17:53:12Z) - NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding [54.88765757043535]
This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference.<n>Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types with less than 7% computational overhead.<n>The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search.
arXiv Detail & Related papers (2025-05-28T20:43:10Z) - FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness [0.0]
Methods like FlashAttention have achieved a x6 performance improvement over native PyTorch by avoiding unnecessary data transfers.<n>This paper extends Neural Circuit Diagrams for deep learning models to consider resource usage and the distribution of tasks across a GPU hierarchy.<n>We develop a methodology for representing intermediate-level pseudocode with diagrams, allowing hardware-aware algorithms to be derived step-by-step.
arXiv Detail & Related papers (2024-12-04T13:52:04Z) - Implementation and Analysis of GPU Algorithms for Vecchia Approximation [0.8057006406834466]
Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms.
While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units ( GPU) is lacking.
We show that our new method outperforms the other two and then present it in the GpGpU R package.
arXiv Detail & Related papers (2024-07-03T01:24:44Z) - GPU Based Differential Evolution: New Insights and Comparative Study [7.5961910202572644]
This work reviews the main architectural choices made in the literature for GPU based Differential Evolution algorithms.
It introduces a new GPU based numerical optimisation benchmark to evaluate and compare GPU based DE algorithms.
arXiv Detail & Related papers (2024-05-26T12:40:39Z) - High Performance Computing Applied to Logistic Regression: A CPU and GPU
Implementation Comparison [0.0]
We present a versatile GPU-based parallel version of Logistic Regression (LR)
Our implementation is a direct translation of the parallel Gradient Descent Logistic Regression algorithm proposed by X. Zou et al.
Our method is particularly advantageous for real-time prediction applications like image recognition, spam detection, and fraud detection.
arXiv Detail & Related papers (2023-08-19T14:49:37Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - A modular software framework for the design and implementation of
ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms.
Despite its simplicity, the software leverages accelerated processing through the PyTorch interface.
Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z) - Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous
Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.