Related papers: Parallelized Code Generation from Simulink Models for Event-driven and Timer-driven ROS 2 Nodes

Parallelized Code Generation from Simulink Models for Event-driven and Timer-driven ROS 2 Nodes

URL: http://arxiv.org/abs/2512.23605v1
Date: Mon, 29 Dec 2025 16:59:59 GMT
Title: Parallelized Code Generation from Simulink Models for Event-driven and Timer-driven ROS 2 Nodes
Authors: Kenshin Obi, Ryo Yoshinaka, Hiroshi Fujimoto, Takuya Azumi,
Abstract summary: Traditional manual program parallelization faces challenges, including maintaining data integrity and avoiding issues such as deadlocks.<n>This paper proposes an MBD framework to overcome these issues, categorizing ROS 2-compatible Simulink models into event-driven and timer-driven types for targeted parallelization.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the complexity and scale of embedded systems, especially in the rapidly developing field of autonomous driving systems, have increased significantly. This has led to the adoption of software and hardware approaches such as Robot Operating System (ROS) 2 and multi-core processors. Traditional manual program parallelization faces challenges, including maintaining data integrity and avoiding concurrency issues such as deadlocks. While model-based development (MBD) automates this process, it encounters difficulties with the integration of modern frameworks such as ROS 2 in multi-input scenarios. This paper proposes an MBD framework to overcome these issues, categorizing ROS 2-compatible Simulink models into event-driven and timer-driven types for targeted parallelization. As a result, it extends the conventional parallelization by MBD and supports parallelized code generation for ROS 2-based models with multiple inputs. The evaluation results show that after applying parallelization with the proposed framework, all patterns show a reduction in execution time, confirming the effectiveness of parallelization.

Related papers

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation [86.82285754460491]
We propose a new benchmark designed to evaluate both text and image output modalities.<n>This performance degradation is strongly correlated with poor alignment between the generated reasoning and the final image.<n>We propose a parallel multimodal diffusion framework, MMaDA-Parallel, that enables continuous, bidirectional interaction between text and images.
arXiv Detail & Related papers (2025-11-12T18:58:21Z)
Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs [61.953548065938385]
We introduce the ''Three Taxes'' (Bulk Synchronous, Inter- Kernel Data Locality, and Kernel Launch Overhead) as an analytical framework.<n>We propose moving beyond the rigid BSP model to address key inefficiencies in distributed GPU execution.<n>We observe a 10-20% speedup in end-to-end latency over BSP-based approaches.
arXiv Detail & Related papers (2025-11-04T01:15:44Z)
A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z)
ASPD: Unlocking Adaptive Serial-Parallel Decoding by Exploring Intrinsic Parallelism in LLMs [34.477777651648914]
Large language models (LLMs) pose significant inference latency challenges due to their autoregressive decoding paradigm.<n>We propose an Adaptive Serial-Parallel Decoding (ASPD) which addresses two core challenges: automated construction of parallelizable data and efficient parallel decoding mechanism.<n>Our framework sets a groundbreaking benchmark for efficient LLM parallel inference, paving the way for its deployment in latency-sensitive applications such as AI-powered customer service bots and answer retrieval engines.
arXiv Detail & Related papers (2025-08-12T12:35:55Z)
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation [20.117825519637357]
We introduce Multiverse, a new generative model that enables natively parallel generation.<n>Next, we build a real-world Multiverse reasoning model with co-design curation of data, algorithm, and system.<n>For data creation, we develop Multiverse Curator, an automated LLM-assisted pipeline.<n>We also implement Multiverse Engine to support parallel inference.
arXiv Detail & Related papers (2025-06-11T17:59:23Z)
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks [22.2075114201037]
We propose a novel large language model (LLM)-driven framework for dual-arm task parallelism planning.<n>RoboPARA employs a two-stage process: Dependency Graph-based Planning Candidates Generation and Graph Re-Traversal-based Dual-Arm Parallel Planning.<n>X-DAPT dataset is the first dataset specifically designed to evaluate dual-arm task parallelism across diverse scenarios and difficulty levels.
arXiv Detail & Related papers (2025-06-07T06:46:24Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks [9.88545357507935]
In existing methods such as Data Parallelism or ZeRO-DP, micro-batches of data are processed in parallel. We propose Cyclic Data Parallelism, a novel paradigm shifting the execution of the micro-batches from simultaneous to sequential.
arXiv Detail & Related papers (2024-03-13T08:39:21Z)
Parallelized Spatiotemporal Binding [47.67393266882402]
We introduce Parallelizable Spatiotemporal Binder or PSB, the first temporally-parallelizable slot learning architecture for sequential inputs. Unlike conventional RNN-based approaches, PSB produces object-centric representations, known as slots, for all time-steps in parallel. Compared to the state-of-the-art, our architecture demonstrates stable training on longer sequences, achieves parallelization that results in a 60% increase in training speed, and yields performance that is on par with or better on unsupervised 2D and 3D object-centric scene decomposition and understanding.
arXiv Detail & Related papers (2024-02-26T23:16:34Z)
On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL) In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh. We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.