Related papers: Opus: A Quantitative Framework for Workflow Evaluation

Opus: A Quantitative Framework for Workflow Evaluation

URL: http://arxiv.org/abs/2511.04220v1
Date: Thu, 06 Nov 2025 09:35:10 GMT
Title: Opus: A Quantitative Framework for Workflow Evaluation
Authors: Alan Seroul, Théo Fagnoni, Inès Adnani, Dana O. Mohamed, Phillip Kingston,
Abstract summary: This paper introduces the Opus Evaluation Framework, a probabilistic-normative formulation for quantifying quality and efficiency.<n>The framework combines the Opus Reward, a function estimating expected performance through success likelihood, resource usage, and output gain, with the Opus Normative Penalties, a set of measurable functions.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper introduces the Opus Workflow Evaluation Framework, a probabilistic-normative formulation for quantifying Workflow quality and efficiency. It integrates notions of correctness, reliability, and cost into a coherent mathematical model that enables direct comparison, scoring, and optimization of Workflows. The framework combines the Opus Workflow Reward, a probabilistic function estimating expected performance through success likelihood, resource usage, and output gain, with the Opus Workflow Normative Penalties, a set of measurable functions capturing structural and informational quality across Cohesion, Coupling, Observability, and Information Hygiene. It supports automated Workflow assessment, ranking, and optimization within modern automation systems such as Opus and can be integrated into Reinforcement Learning loops to guide Workflow discovery and refinement. In this paper, we introduce the Opus Workflow Reward model that formalizes Workflow success as a probabilistic expectation over costs and outcomes. We define measurable Opus Workflow Normative Penalties capturing structural, semantic, and signal-related properties of Workflows. Finally, we propose a unified optimization formulation for identifying and ranking optimal Workflows under joint Reward-Penalty trade-offs.

Related papers

Learning to Compose for Cross-domain Agentic Workflow Generation [56.630382886594184]
We create an open-source LLM for cross-domain workflow generation.<n>We learn a compact set of reusable workflow capabilities across diverse domains.<n>Our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations.
arXiv Detail & Related papers (2026-02-11T18:27:22Z)
Opus: A Prompt Intention Framework for Complex Workflow Generation [0.0]
Opus Prompt Intention Framework is designed to improve complex Generation with instruction-tuned Large Language Models (LLMs)<n>We present a customizable Intention Capture system to extract Signals and Intentions from user queries.<n>We provide empirical evidence that the proposed system significantly improves Generation quality compared to direct generation from user queries.
arXiv Detail & Related papers (2025-07-15T13:13:07Z)
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization [51.280919773837645]
We develop ScoreFlow, a high-performance framework for agent workflow optimization.<n>ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback.<n>It achieves an 8.2% improvement over existing baselines across question answering, coding, and mathematical reasoning.
arXiv Detail & Related papers (2025-02-06T18:47:49Z)
Understanding and Optimizing Agentic Workflows via Shapley value [49.508008396810624]
We introduce ShapleyFlow, the first framework that employs cooperative game theory to analyze and optimize agentic configurations.<n>ShagleyFlow enables fine-grained attribution of each component's contribution and facilitates the identification of task-specific optimal configurations.
arXiv Detail & Related papers (2025-02-01T18:07:34Z)
Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution.<n>However, the effective adjustment of agentic during execution has not been well studied.<n>In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents.<n>Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z)
Opus: A Large Work Model for Complex Workflow Generation [0.0]
Opus is a framework for generating and optimizing tasks tailored to complex Business Process Outsourcing (BPO) use cases.<n>Our approach generates executables from Intention, defined as the alignment of Client Input, Client Output and Process Directed Context.
arXiv Detail & Related papers (2024-11-30T20:00:41Z)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration. It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z)
AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains.<n>We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search.<n> Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.