flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R
- URL: http://arxiv.org/abs/2511.00079v1
- Date: Wed, 29 Oct 2025 17:59:19 GMT
- Title: flowengineR: A Modular and Extensible Framework for Fair and Reproducible Workflow Design in R
- Authors: Maximilian Willer, Peter Ruckdeschel,
- Abstract summary: flowengineR is an R package designed to provide a modular and framework for building reproducible algorithmic pipelines.<n> flowengineR addresses this by introducing a unified architecture of standardized engines for data splitting, execution, preprocessing, training, inprocessing, postprocessing, evaluation, and reporting.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: flowengineR is an R package designed to provide a modular and extensible framework for building reproducible algorithmic workflows for general-purpose machine learning pipelines. It is motivated by the rapidly evolving field of algorithmic fairness where new metrics, mitigation strategies, and machine learning methods continuously emerge. A central challenge in fairness, but also far beyond, is that existing toolkits either focus narrowly on single interventions or treat reproducibility and extensibility as secondary considerations rather than core design principles. flowengineR addresses this by introducing a unified architecture of standardized engines for data splitting, execution, preprocessing, training, inprocessing, postprocessing, evaluation, and reporting. Each engine encapsulates one methodological task yet communicates via a lightweight interface, ensuring workflows remain transparent, auditable, and easily extensible. Although implemented in R, flowengineR builds on ideas from workflow languages (CWL, YAWL), graph-oriented visual programming languages (KNIME), and R frameworks (BatchJobs, batchtools). Its emphasis, however, is less on orchestrating engines for resilient parallel execution but rather on the straightforward setup and management of distinct engines as data structures. This orthogonalization enables distributed responsibilities, independent development, and streamlined integration. In fairness context, by structuring fairness methods as interchangeable engines, flowengineR lets researchers integrate, compare, and evaluate interventions across the modeling pipeline. At the same time, the architecture generalizes to explainability, robustness, and compliance metrics without core modifications. While motivated by fairness, it ultimately provides a general infrastructure for any workflow context where reproducibility, transparency, and extensibility are essential.
Related papers
- ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z) - Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z) - Operon: Incremental Construction of Ragged Data via Named Dimensions [1.6212518002538465]
Existing workflow engines lack native support for tracking the shapes and dependencies inherent to ragged data.<n>We present Operon, a Rust-based workflow engine that addresses these challenges through a novel formalism of named dimensions with explicit dependency relations.
arXiv Detail & Related papers (2025-11-20T06:16:31Z) - Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agentic AI Workflows [0.0]
Generative Agentic AI systems are emerging as a powerful paradigm for automating complex, multi-step tasks.<n>Existing frameworks for building these systems introduce significant complexity, a steep learning curve, and substantial boilerplate code.<n>This introduces simpliflow, a lightweight, open-source Python framework designed to address these challenges.
arXiv Detail & Related papers (2025-10-12T16:02:50Z) - VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z) - Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z) - AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training [24.60677187852425]
Reinforcement learning (RL) has become a pivotal technology in the post-training phase of large language models (LLMs)<n>Traditional task-colocated RL frameworks suffer from significant scalability bottlenecks.<n>Task-separated RL frameworks face challenges in complex dataflows and the corresponding resource idling and workload imbalance.<n>We propose AsyncFlow, an asynchronous streaming RL framework for efficient post-training.
arXiv Detail & Related papers (2025-07-02T12:45:34Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - Automated Evolutionary Approach for the Design of Composite Machine
Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines.
It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them.
The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.