Related papers: From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures

From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures

URL: http://arxiv.org/abs/2509.24030v1
Date: Sun, 28 Sep 2025 18:54:38 GMT
Title: From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures
Authors: Anjus George, Michael Brim, Christopher Zimmer, David Rogers, Sarp Oral, Zach Mayes,
Abstract summary: We investigate three cross-facility data streaming architectures, Direct Streaming (DTS), Proxied Streaming (PRS), and Managed Service Streaming (MSS)<n>Our study shows that DTS offers a minimal-hop path, resulting in higher throughput and lower latency, whereas MSS provides greater deployment feasibility and scalability across multiple users but incurs significant overhead.<n> PRS lies in between, offering a scalable architecture whose performance matches DTS in most cases.
Score: 0.8116550081617903
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this paper, we investigate three cross-facility data streaming architectures, Direct Streaming (DTS), Proxied Streaming (PRS), and Managed Service Streaming (MSS). We examine their architectural variations in data flow paths and deployment feasibility, and detail their implementation using the Data Streaming to HPC (DS2HPC) architectural framework and the SciStream memory-to-memory streaming toolkit on the production-grade Advanced Computing Ecosystem (ACE) infrastructure at Oak Ridge Leadership Computing Facility (OLCF). We present a workflow-specific evaluation of these architectures using three synthetic workloads derived from the streaming characteristics of scientific workflows. Through simulated experiments, we measure streaming throughput, round-trip time, and overhead under work sharing, work sharing with feedback, and broadcast and gather messaging patterns commonly found in AI-HPC communication motifs. Our study shows that DTS offers a minimal-hop path, resulting in higher throughput and lower latency, whereas MSS provides greater deployment feasibility and scalability across multiple users but incurs significant overhead. PRS lies in between, offering a scalable architecture whose performance matches DTS in most cases.

Related papers

A Context-Aware Knowledge Graph Platform for Stream Processing in Industrial IoT [0.0]
This work proposes a context-aware semantic platform for data stream management.<n>It unifies heterogeneous IoT/IoE data sources through a Knowledge Graph.<n>It supports flexible data gathering, composable stream processing pipelines, and dynamic role-based data access based on agents' contexts.
arXiv Detail & Related papers (2026-02-23T15:55:32Z)
Watch and Learn: Learning to Use Computers from Online Videos [50.10702690339142]
Watch & Learn (W&L) is a framework that converts human demonstration videos readily available on the Internet into executable UI trajectories at scale.<n>We develop an inverse dynamics labeling pipeline with task-aware video retrieval, generate over 53k high-quality trajectories from raw web videos.<n>These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment.
arXiv Detail & Related papers (2025-10-06T10:29:00Z)
The LCLStream Ecosystem for Multi-Institutional Dataset Exploration [4.640896995209328]
We describe a new end-to-end experimental data streaming framework designed from the ground up to support new types of applications.<n>The LCLStreamer framework has prototyped and implemented several new paradigms critical for future generation experiments.
arXiv Detail & Related papers (2025-10-05T03:13:38Z)
Scalable Engine and the Performance of Different LLM Models in a SLURM based HPC architecture [3.746889836344766]
This work elaborates on a High performance computing architecture based on Simple Linux Utility for Resource Management (SLURM)<n> Dynamic resource scheduling and seamless integration of containerized have been leveraged to manage CPU, GPU, and memory efficiently in multi-node clusters.<n>The obtained results pave ways for significantly more efficient, responsive, and fault-tolerant LLM inference on large-scale HPC infrastructures.
arXiv Detail & Related papers (2025-08-25T09:11:27Z)
(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing. One of the key questions of structural pruning is how to estimate the channel significance. We propose a novel algorithmic framework, namely textttPASS. It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z)
Streaming Technologies and Serialization Protocols: Empirical Performance Analysis [0.70224924046445]
Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training. Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements. Our study uncovers significant performance differences and trade-offs between these technologies.
arXiv Detail & Related papers (2024-07-18T13:24:08Z)
Cross-domain Learning Framework for Tracking Users in RIS-aided Multi-band ISAC Systems with Sparse Labeled Data [55.70071704247794]
Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs) This paper proposes the X2Track framework, where we model the tracking function by a hierarchical architecture, jointly utilizing multi-modal CSI indicators across multiple bands, and optimize it in a cross-domain manner. Under X2Track, we design an efficient deep learning algorithm to minimize tracking errors, based on transformer neural networks and adversarial learning techniques.
arXiv Detail & Related papers (2024-05-10T08:04:27Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications [7.850979932441607]
Pathway is a new unified data processing framework that can run workloads on both bounded and unbounded data streams. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts.
arXiv Detail & Related papers (2023-07-12T08:27:37Z)
FENXI: Deep-learning Traffic Analytics at the Edge [69.34903175081284]
We present FENXI, a system to run complex analytics by leveraging TPU. FENXI decouples operations and traffic analytics which operates at different granularities. Our analysis shows that FENXI can sustain forwarding line rate traffic processing requiring only limited resources.
arXiv Detail & Related papers (2021-05-25T08:02:44Z)
Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These networks consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.