Related papers: stratum: A System Infrastructure for Massive Agent-Centric ML Workloads

stratum: A System Infrastructure for Massive Agent-Centric ML Workloads

URL: http://arxiv.org/abs/2603.03589v2
Date: Thu, 05 Mar 2026 07:47:35 GMT
Title: stratum: A System Infrastructure for Massive Agent-Centric ML Workloads
Authors: Arnab Phani, Elias Strauss, Sebastian Schelter,
Abstract summary: Large language models (LLMs) generate, validate, and optimize complete machine learning (ML) pipelines.<n>The existing Python-based ML ecosystem is built around libraries such as Panda scikit-learn.<n>We propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning.
Score: 8.123450153690424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) transform how machine learning (ML) pipelines are developed and evaluated. LLMs enable a new type of workload, agentic pipeline search, in which autonomous or semi-autonomous agents generate, validate, and optimize complete ML pipelines. These agents predominantly operate over popular Python ML libraries and exhibit highly exploratory behavior. This results in thousands of executions for data profiling, pipeline generation, and iterative refinement of pipeline stages. However, the existing Python-based ML ecosystem is built around libraries such as Pandas and scikit-learn, which are designed for human-centric, interactive, sequential workflows and remain constrained by Python's interpretive execution model, library-level isolation, and limited runtime support for executing large numbers of pipelines. Meanwhile, many high-performance ML systems proposed by the systems community either target narrow workload classes or require specialized programming models, which limits their integration with the Python ML ecosystem and makes them largely ill-suited for LLM-based agents. This growing mismatch exposes a fundamental systems challenge in supporting agentic pipeline search at scale. We therefore propose stratum, a unified system infrastructure that decouples pipeline execution from planning and reasoning during agentic pipeline search. Stratum integrates seamlessly with existing Python libraries, compiles batches of pipelines into optimized execution graphs, and efficiently executes them across heterogeneous backends, including a novel Rust-based runtime. We present stratum's architectural vision along with an early prototype, discuss key design decisions, and outline open challenges and research directions. Finally, preliminary experiments show that stratum can significantly speed up large-scale agentic pipeline search up to 16.6x.

Related papers

SemPipes -- Optimizable Semantic Data Operators for Tabular Machine Learning Pipelines [12.816711873869984]
We introduce SemPipes, a novel declarative programming model that integrates semantic data operators into ML pipelines.<n>SemPipes synthesizes custom operator implementations based on data characteristics, operator instructions, and pipeline context.<n>We show that semantic operators substantially improve end-to-end predictive performance for both expert-designed and agent-generated pipelines.
arXiv Detail & Related papers (2026-02-04T23:36:29Z)
Plug-and-Play Benchmarking of Reinforcement Learning Algorithms for Large-Scale Flow Control [61.155940786140455]
Reinforcement learning (RL) has shown promising results in active flow control (AFC)<n>Current AFC benchmarks rely on external computational fluid dynamics (CFD) solvers, are not fully differentiable, and provide limited 3D and multi-agent support.<n>We introduce FluidGym, the first standalone, fully differentiable benchmark suite for RL in AFC.
arXiv Detail & Related papers (2026-01-21T14:13:44Z)
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models [59.7059443712562]
AdaPtis is a training system for large language models (LLMs) that supports adaptive pipeline parallelism.<n>Extensive experiments show that AdaPtis achieves an average speedup of 1.42x (up to 2.14x) over Megatron-LM I-1F1B.
arXiv Detail & Related papers (2025-09-28T08:05:13Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.<n>Recent works have started exploiting large language models (LLM) to lessen such burden.<n>This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans [3.2362171533623054]
We envision highly-automated software platforms to assist data scientists with developing, validating, monitoring, and analysing their Machine Learning pipelines. We extract "logical query plans" from ML pipeline code relying on popular libraries. Based on these plans, we automatically infer pipeline semantics and instrument and rewrite the ML pipelines to enable diverse use cases without requiring data scientists to manually annotate or rewrite their code.
arXiv Detail & Related papers (2024-07-10T11:35:02Z)
Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise. Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components. This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions [28.718446733713183]
We propose an AutoML SapientML that can learn from a corpus of existing datasets and their human-written pipelines. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
arXiv Detail & Related papers (2022-02-18T20:45:47Z)
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition [57.06900573003609]
VolcanoML is a framework that decomposes a large AutoML search space into smaller ones. It supports a Volcano-style execution model, akin to the one supported by modern database systems. Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies.
arXiv Detail & Related papers (2021-07-19T13:23:57Z)
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models [60.23234205219347]
TeraPipe is a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster.
arXiv Detail & Related papers (2021-02-16T07:34:32Z)
AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation [13.116806430326513]
We propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR) The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components.
arXiv Detail & Related papers (2020-11-21T14:05:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.