Related papers: High-Performance Parallel Optimization of the Fish School Behaviour on the Setonix Platform Using OpenMP

High-Performance Parallel Optimization of the Fish School Behaviour on the Setonix Platform Using OpenMP

URL: http://arxiv.org/abs/2507.20173v1
Date: Sun, 27 Jul 2025 08:25:08 GMT
Title: High-Performance Parallel Optimization of the Fish School Behaviour on the Setonix Platform Using OpenMP
Authors: Haitian Wang, Long Qin,
Abstract summary: This paper presents an in-depth investigation into the high-performance parallel optimization of the Fish School Behaviour (FSB) algorithm on the Setonix supercomputing platform.<n>The FSB algorithm, inspired by nature's social behavior patterns, provides an ideal platform for parallelization due to its iterative and computationally intensive nature.
Score: 1.1533029170925908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents an in-depth investigation into the high-performance parallel optimization of the Fish School Behaviour (FSB) algorithm on the Setonix supercomputing platform using the OpenMP framework. Given the increasing demand for enhanced computational capabilities for complex, large-scale calculations across diverse domains, there's an imperative need for optimized parallel algorithms and computing structures. The FSB algorithm, inspired by nature's social behavior patterns, provides an ideal platform for parallelization due to its iterative and computationally intensive nature. This study leverages the capabilities of the Setonix platform and the OpenMP framework to analyze various aspects of multi-threading, such as thread counts, scheduling strategies, and OpenMP constructs, aiming to discern patterns and strategies that can elevate program performance. Experiments were designed to rigorously test different configurations, and our results not only offer insights for parallel optimization of FSB on Setonix but also provide valuable references for other parallel computational research using OpenMP. Looking forward, other factors, such as cache behavior and thread scheduling strategies at micro and macro levels, hold potential for further exploration and optimization.

Related papers

A Comparative Study of OpenMP Scheduling Algorithm Selection Strategies [4.068270792140994]
We propose and evaluate learning-based approaches for selecting scheduling algorithms in OpenMP.<n>Our results show that RL methods are capable of learning high-performing scheduling decisions.<n>The approach can also be extended to MPI-based programs, enabling optimization of scheduling decisions across multiple levels of parallelism.
arXiv Detail & Related papers (2025-07-27T15:10:30Z)
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs [81.5049387116454]
We introduce APB, an efficient long-context inference framework.<n>APB uses multi-host approximate attention to enhance prefill speed.<n>APB achieves speeds of up to 9.2x, 4.2x, and 1.6x compared with FlashAttn, RingAttn, and StarAttn, respectively.
arXiv Detail & Related papers (2025-02-17T17:59:56Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Hybrid programming-model strategies for GPU offloading of electronic structure calculation kernels [2.4898174182192974]
PROGRESS is a library for electronic structure solvers. It implements linear algebra operations for electronic structure kernels. We describe the general strategies used for these implementations on various computer architectures.
arXiv Detail & Related papers (2024-01-24T19:38:01Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Performance Optimization using Multimodal Modeling and Heterogeneous GNN [1.304892050913381]
We propose a technique for tuning parallel code regions that is general enough to be adapted to multiple tasks. In this paper, we analyze IR-based programming models to make task-specific performance optimizations. Our experiments show that this multimodal learning based approach outperforms the state-of-the-art in all experiments.
arXiv Detail & Related papers (2023-04-25T04:27:43Z)
ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree. We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region. Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z)
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.