Related papers: Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms

Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms

URL: http://arxiv.org/abs/2505.15590v1
Date: Wed, 21 May 2025 14:46:41 GMT
Title: Bridging the Gap: Physical PCI Device Integration Into SystemC-TLM Virtual Platforms
Authors: Nils Bosbach, Rebecca Pelke, Niko Zurstraßen, Jan Henrik Weinstock, Lukas Jünger, Rainer Leupers,
Abstract summary: Virtual Platforms (VPs) serve as a platform to execute and debug the unmodified target software at an early design stage.<n>VPs need to provide high simulation speed to ensure the target software executes within a reasonable time.<n>This paper introduces a novel approach for integrating real Peripheral Component Interconnect ( PCI) devices into SystemC-TLM-2.0-based VPs.
Score: 0.16492989697868893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In today's technology-driven world, early-stage software development and testing are crucial. Virtual Platforms (VPs) have become indispensable tools for this purpose as they serve as a platform to execute and debug the unmodified target software at an early design stage. With the increasing complexity of software, especially in areas like Artificial Intelligence (AI) applications, VPs need to provide high simulation speed to ensure the target software executes within a reasonable time. Hybrid simulation, which combines virtual models with real hardware, can improve the performance of VPs. This paper introduces a novel approach for integrating real Peripheral Component Interconnect (PCI) devices into SystemC-TLM-2.0-based VPs. The embedded PCI devices enable high performance, easy integration, and allow introspection for analysis and optimization. To illustrate the practical application of our approach, we present a case study where we integrate Google Coral's Edge Tensor Processing Unit (TPU) into an ARM-based VP. The integration allows efficient execution of AI workloads, accelerating simulation speeds by up to 480x while eliminating the need for complex virtual device models. Beyond accelerating AI-workload execution, our framework enables driver development, regression testing across architectures, and device communication analysis. Our findings demonstrate that embedding PCI devices into SystemC simulations significantly enhances

Related papers

Scalable Software Testing in Fast Virtual Platforms: Leveraging SystemC, QEMU and Containerization [0.0]
The ever-increasing complexity of HW/SW systems presents a persistent challenge, particularly in safety-critical domains like automotive.<n>To address this, Virtual Platforms (VPs) based on the SystemC TLM-2.0 standard have emerged as a pivotal solution.<n>We propose an approach leveraging containerization to encapsulate VPs in order to reduce environment dependencies and enable cloud deployment.
arXiv Detail & Related papers (2025-06-12T12:08:53Z)
High-Performance ARM-on-ARM Virtualization for Multicore SystemC-TLM-Based Virtual Platforms [0.16492989697868893]
ARM-on-ARM virtual platform achieves up to 10 x speedup over traditional instruction-set-simulator-based models on compute-intensive workloads.<n>We present a multicore SystemC-TLM-based CPU model that can be used as a drop-in replacement for an instruction-set simulator.
arXiv Detail & Related papers (2025-05-19T11:21:45Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs [7.519011820592022]
It is challenging to jointly develop and debug host software and FPGA hardware.<n>Changes to the hardware design require a time-consuming FPGA synthesis process.<n>A VM-HDL co-simulation framework is designed to run the same software, operating system, and hardware designs as the target physical system.
arXiv Detail & Related papers (2025-01-19T22:06:36Z)
Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC [3.829322478948514]
This paper develops a hybrid AI/CV system on Intel's Movidius Myriad X for initializing and tracking the satellite's pose in space missions. The proposed single-chip, robust-estimation, and real-time solution delivers a throughput of up to 5 FPS for 1-MegaPixel RGB images within a limited power envelope of 2W.
arXiv Detail & Related papers (2024-09-19T17:50:50Z)
Automatic Platform Configuration and Software Integration for Software-Defined Vehicles [4.522485108591059]
This paper introduces a novel approach to automate platform configuration and software integration for software-defined vehicles (SDVs) By leveraging model-based systems engineering (MBSE), our method automatically generates platform configuration and software integration plans. The proposed system enables dynamic and flexible resource allocation while ensuring compliance with safety requirements.
arXiv Detail & Related papers (2024-08-04T19:54:03Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices [48.47320494918925]
This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources.
arXiv Detail & Related papers (2023-09-04T13:15:01Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations. In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.