Related papers: Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver

Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver

URL: http://arxiv.org/abs/2510.08536v1
Date: Thu, 09 Oct 2025 17:53:12 GMT
Title: Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver
Authors: Gregor Olenik, Marcel Koch, Hartwig Anzt,
Abstract summary: Existing approaches either fully or use plugin-based GPU solvers, each facing trade-offs between performance and development effort.<n>We propose a repartitioning strategy that better balances CPU matrix assembly and GPU-based linear solves.<n>Our results show that the proposed method significantly mitigates oversubscription issues, improving solver performance and resource utilization.
Score: 0.688204255655161
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern high-performance computing (HPC) increasingly relies on GPUs, but integrating GPU acceleration into complex scientific frameworks like OpenFOAM remains a challenge. Existing approaches either fully refactor the codebase or use plugin-based GPU solvers, each facing trade-offs between performance and development effort. In this work, we address the limitations of plugin-based GPU acceleration in OpenFOAM by proposing a repartitioning strategy that better balances CPU matrix assembly and GPU-based linear solves. We present a detailed computational model, describe a novel matrix repartitioning and update procedure, and evaluate its performance on large-scale CFD simulations. Our results show that the proposed method significantly mitigates oversubscription issues, improving solver performance and resource utilization in heterogeneous CPU-GPU environments.

Related papers

GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z)
Scaling Behaviors of Evolutionary Algorithms on GPUs: When Does Parallelism Pay Off? [43.96509049196842]
Evolutionary algorithms (EAs) are increasingly implemented on graphics processing units (GPUs) to leverage parallel processing capabilities for enhanced efficiency.<n>We investigate how GPU parallelism alters the behavior of EAs beyond simple acceleration metrics.<n>Our results reveal that the impact of GPU acceleration is highly heterogeneous and depends strongly on algorithmic structure.
arXiv Detail & Related papers (2026-01-26T12:55:21Z)
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems [0.0]
GPU-Virt-Bench is a comprehensive benchmarking framework that evaluates GPU virtualization systems across 56 performance metrics.<n>We demonstrate the framework's utility through evaluation of HAMi-core, BUD-FCSP, and simulated MIG baselines.
arXiv Detail & Related papers (2025-11-26T09:42:05Z)
GPU-Accelerated Loopy Belief Propagation for Program Analysis [3.516434517865342]
This paper presents a GPU-accelerated LBP algorithm for program analysis.<n>We propose a unified representation for specifying arbitrary user-defined update strategies, along with a dependency analysis algorithm.<n>Our approach achieves an average speedup of $2.14times$ over the state-of-the-art sequential approach and $5.56times$ over the state-of-the-art GPU-based approach.
arXiv Detail & Related papers (2025-09-26T13:30:30Z)
PICT -- A Differentiable, GPU-Accelerated Multi-Block PISO Solver for Simulation-Coupled Learning Tasks in Fluid Dynamics [59.38498811984876]
We present our fluid simulator PICT, a differentiable pressure-implicit solver coded in PyTorch with Graphics-processing-unit (GPU) support.<n>We first verify the accuracy of both the forward simulation and our derived gradients in various established benchmarks.<n>We show that the gradients provided by our solver can be used to learn complicated turbulence models in 2D and 3D.
arXiv Detail & Related papers (2025-05-22T17:55:10Z)
A GPU Implementation of Multi-Guiding Spark Fireworks Algorithm for Efficient Black-Box Neural Network Optimization [2.9608128305931825]
This paper presents a GPU-accelerated version of the Multi-Guiding Spark Fireworks Algorithm (MGFWA)<n>We demonstrate its superior performance in terms of both speed and solution quality.<n>The proposed implementation offers a promising approach to accelerate swarm intelligence algorithms.
arXiv Detail & Related papers (2025-01-07T17:09:07Z)
Multi-GPU RI-HF Energies and Analytic Gradients $-$ Towards High Throughput Ab Initio Molecular Dynamics [0.0]
This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock energies and analytic gradients using multiple Graphics Processing Units (GPUs) The algorithm is especially designed for high throughput emphab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms)
arXiv Detail & Related papers (2024-07-29T00:14:10Z)
Optimized thread-block arrangement in a GPU implementation of a linear solver for atmospheric chemistry mechanisms [0.0]
Earth system models (ESM) demand significant hardware resources and energy consumption to solve atmospheric chemistry processes. Recent studies have shown improved performance from running these models on GPU accelerators. This study proposes an optimized distribution of the chemical solver's computational load on the GPU, named Block-cells.
arXiv Detail & Related papers (2024-05-27T17:12:59Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.