Streaming Tensor Program: A streaming abstraction for dynamic parallelism
- URL: http://arxiv.org/abs/2511.07776v1
- Date: Wed, 12 Nov 2025 01:17:28 GMT
- Title: Streaming Tensor Program: A streaming abstraction for dynamic parallelism
- Authors: Gina Sohn, Genghan Zhang, Konstantin Hossfeld, Jungwoo Kim, Nathan Sobotka, Nathan Zhang, Olivia Hsu, Kunle Olukotun,
- Abstract summary: Streaming Program (STeP) is a new streaming abstraction that enables dynamic tensor workloads to run efficiently on spatial dataflow accelerators.<n> STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic shape semantics that expose dynamic data rates and tensor dimensions.<n>These capabilities unlock new optimizations-dynamic tiling, dynamic parallelization, and configuration time-multiplexing-that adapt to dynamic behaviors while preserving dataflow efficiency.
- Score: 3.2194902146668127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic behaviors are becoming prevalent in many tensor applications. In machine learning, for example, the input tensors are dynamically shaped or ragged, and data-dependent control flow is widely used in many models. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators forces the dynamic behaviors to be implemented statically or lacks the visibility for performance-critical decisions. To address these challenges, we present the Streaming Tensor Program (STeP), a new streaming abstraction that enables dynamic tensor workloads to run efficiently on spatial dataflow accelerators. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations-dynamic tiling, dynamic parallelization, and configuration time-multiplexing-that adapt to dynamic behaviors while preserving dataflow efficiency. Using a cycle-approximate simulator on representative LLM layers with real-world traces, dynamic tiling reduces on-chip memory requirement by 2.18x, dynamic parallelization improves latency by 1.5x, and configuration time-multiplexing improves compute utilization by 2.57x over implementations available in prior abstractions.
Related papers
- DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation [52.83157499300261]
We present DynamicVLA, a framework for dynamic object manipulation that integrates temporal reasoning and closed-loop adaptation.<n>We introduce the Dynamic Object Manipulation benchmark, built from scratch with an auto data collection pipeline.<n>Extensive evaluations demonstrate remarkable improvements in response speed, perception, and generalization.
arXiv Detail & Related papers (2026-01-29T18:59:51Z) - Bidirectional Feature-aligned Motion Transformation for Efficient Dynamic Point Cloud Compression [97.66080040613726]
We propose a Bidirectional Feature-aligned Motion Transformation (Bi-FMT) framework that implicitly models motion in the feature space.<n>Bi-FMT aligns features across both past and future frames to produce temporally consistent latent representations.<n>We show Bi-FMT surpasses D-DPCC and AdaDPCC in both compression efficiency and runtime.
arXiv Detail & Related papers (2025-09-18T03:51:06Z) - CCLSTM: Coupled Convolutional Long-Short Term Memory Network for Occupancy Flow Forecasting [0.0]
We propose textbfCoupled Convolutional LSTM (CTM), a lightweight, end-to-end trainable architecture based solely on convolutional operations.<n>CTM achieves state-of-the-art performance on occupancy flow metrics and, as of this submission, ranks (textst) in all metrics on the 2024 Occupancy and Flow Prediction Challenge leaderboard.
arXiv Detail & Related papers (2025-06-06T14:38:55Z) - SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments [10.303368447554591]
This paper proposes a multi-task framework to simultaneously predict scene flow and instance segmentation of full-temporal point clouds.<n>The novelty of this work is threefold: 1) developing a coarse-to-fine prediction based multitask scheme, where an initial coarse segmentation of static backgrounds and dynamic objects is used to provide contextual information for refining motion and semantic information through a shared feature processing module; 2) developing a set of loss functions to enhance the performance of scene flow estimation and instance segmentation, while can help ensure spatial and temporal consistency of both static and dynamic objects within traffic scenes; 3) developing a self-supervised learning scheme, which utilizes coarse
arXiv Detail & Related papers (2025-03-19T02:43:19Z) - Dynamic Trend Fusion Module for Traffic Flow Prediction [9.650380389159459]
Existing methods often model spatial and temporal correlations separately failing to effectively fuse them.<n>We propose Dynamic Spatial-Temporal Trend Transformer DST2 to fuse dynamic correlations for learning multi-view dynamic features of traffic networks.<n>Experiments on four real-world traffic datasets demonstrate that our framework achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-01-18T15:16:47Z) - Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs [0.2578242050187029]
We describe Tempo, a new deep learning system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation.<n>We show that Tempo achieves a 7$times$ speedup over JAX for Llama-3.2-3B decoding.<n>For reinforcement learning algorithms, Tempo achieves a 54$times$ speedup, with 16$times$ lower peak memory usage.
arXiv Detail & Related papers (2025-01-09T18:05:33Z) - TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph.
Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales.
We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z) - PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for
Traffic Flow Prediction [78.05103666987655]
spatial-temporal Graph Neural Network (GNN) models have emerged as one of the most promising methods to solve this problem.
We propose a novel propagation delay-aware dynamic long-range transFormer, namely PDFormer, for accurate traffic flow prediction.
Our method can not only achieve state-of-the-art performance but also exhibit competitive computational efficiency.
arXiv Detail & Related papers (2023-01-19T08:42:40Z) - PAD-Net: An Efficient Framework for Dynamic Networks [72.85480289152719]
Common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones.
We propose a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures.
arXiv Detail & Related papers (2022-11-10T12:42:43Z) - D$^3$FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance [61.14088096348959]
We introduce a self-supervised deep SLAM method that robustly operates in dynamic scenes while accurately identifying dynamic components.
We propose a dynamic update module based on this representation and develop a dense SLAM system that excels in dynamic scenarios.
arXiv Detail & Related papers (2022-07-18T17:47:39Z) - Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z) - Dynamic Graph Convolutional Recurrent Network for Traffic Prediction:
Benchmark and Solution [18.309299822858243]
We propose a novel traffic prediction framework, named Dynamic Graph Contemporalal Recurrent Network (DGCRN)
In DGCRN, hyper-networks are designed to leverage and extract dynamic characteristics from node attributes.
We are the first to employ a generation method to model fine iteration of dynamic graph at each time step.
arXiv Detail & Related papers (2021-04-30T11:25:43Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.