Related papers: High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms

High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms

URL: http://arxiv.org/abs/2505.12987v2
Date: Tue, 24 Jun 2025 08:13:50 GMT
Title: High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms
Authors: Nils Bosbach, Rebecca Pelke, Niko Zurstraßen, Jan Henrik Weinstock, Lukas Jünger, Rainer Leupers,
Abstract summary: ARM-on-ARM virtual platform achieves up to 10 x speedup over traditional instruction-set-simulator-based models on compute-intensive workloads.<n>We present a multicore SystemC-TLM-based CPU model that can be used as a drop-in replacement for an instruction-set simulator.
Score: 0.16492989697868893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing complexity of hardware and software requires advanced development and test methodologies for modern systems on chips. This paper presents a novel approach to ARM-on-ARM virtualization within SystemC-based simulators using Linux's KVM to achieve high-performance simulation. By running target software natively on ARM-based hosts with hardware-based virtualization extensions, our method eliminates the need for instruction-set simulators, which significantly improves performance. We present a multicore SystemC-TLM-based CPU model that can be used as a drop-in replacement for an instruction-set simulator. It places no special requirements on the host system, making it compatible with various environments. Benchmark results show that our ARM-on-ARM-based virtual platform achieves up to 10 x speedup over traditional instruction-set-simulator-based models on compute-intensive workloads. Depending on the benchmark, speedups increase to more than 100 x.

Related papers

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations [0.6240840318920522]
We present chemtrain-deploy, a framework that enables model-agnostic deployment of LAMMPS in MD simulations.<n>Chemtrain-deploy supports any JAX-defined semi-local potential, allowing users to exploit the functionality of LAMMPS.<n>It achieves state-of-the-art efficiency and scales to systems containing millions of atoms.
arXiv Detail & Related papers (2025-06-04T15:19:26Z)
Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms [0.16492989697868893]
Virtual Platforms (VPs) serve as a platform to execute and debug the unmodified target software at an early design stage.<n>VPs need to provide high simulation speed to ensure the target software executes within a reasonable time.<n>This paper introduces a novel approach for integrating real Peripheral Component Interconnect ( PCI) devices into SystemC-TLM-2.0-based VPs.
arXiv Detail & Related papers (2025-05-21T14:46:41Z)
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving [12.068287973463786]
Serving Large Language Models (LLMs) is critical for AI-powered applications but demands substantial computational resources.<n>Low-precision computation has emerged as a key technique to improve efficiency while reducing resource consumption.<n>Existing approaches for generating low-precision kernels are limited to weight bit widths that are powers of two.
arXiv Detail & Related papers (2025-04-17T14:45:03Z)
oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science [1.5672115019395867]
UXL's oneAPI Data Analytics Library (oneDAL) is widely adopted for accelerating ML and data analytics.<n>But its reliance on Intel's Math Kernel Library (MKL) has traditionally limited its compatibility to x86platforms.<n>This paper details the porting of oneDAL to ARM architectures with SVE support, using OpenBLAS as an alternative backend to overcome architectural and performance challenges.
arXiv Detail & Related papers (2025-04-05T17:53:36Z)
KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations. In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z)
PLSSVM: A (multi-)GPGPU-accelerated Least Squares Support Vector Machine [68.8204255655161]
Support Vector Machines (SVMs) are widely used in machine learning. However, even modern and optimized implementations do not scale well for large non-trivial dense data sets on cutting-edge hardware. PLSSVM can be used as a drop-in replacement for an LVM.
arXiv Detail & Related papers (2022-02-25T13:24:23Z)
Compiler-Driven Simulation of Reconfigurable Hardware Accelerators [0.8807375890824978]
Existing simulators tend to two extremes: low-level and general approaches, such as RTL simulation, that can model any hardware but require substantial effort and long execution times. This work proposes a compiler-driven simulation workflow that can model hardware accelerator.
arXiv Detail & Related papers (2022-02-01T20:31:04Z)
Using Machine Learning at Scale in HPC Simulations with SmartSim: An Application to Ocean Climate Modeling [52.77024349608834]
We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for traditional HPC simulations.
arXiv Detail & Related papers (2021-04-13T19:27:28Z)
Achieving 100X faster simulations of complex biological phenomena by coupling ML to HPC ensembles [47.44377051031385]
We present DeepDriveMD, a tool for a range of prototypical ML-driven HPC simulation scenarios. We use it to quantify improvements in the scientific performance of ML-driven ensemble-based applications.
arXiv Detail & Related papers (2021-04-10T15:52:39Z)
Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning [0.0]
We show that the chosen simulation environments benefit the most from single core performance. Using a multi core system, multiple simulations could be run in parallel to increase the performance.
arXiv Detail & Related papers (2021-03-08T09:08:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.