Fugu-MT 論文翻訳(概要): Towards Reliable Generation of Executable Workflows by Foundation Models

論文の概要: Towards Reliable Generation of Executable Workflows by Foundation Models

arxiv url: http://arxiv.org/abs/2509.25117v1
Date: Mon, 29 Sep 2025 17:42:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:20.17558
Title: Towards Reliable Generation of Executable Workflows by Foundation Models
Title（参考訳）: 基礎モデルによる実行可能ワークフローの信頼性向上に向けて
Authors: Sogol Masoumzadeh, Keheliya Gallaba, Dayi Lin, Ahmed E. Hassan,
Abstract要約: この作業では、静的解析フィードバックを活用して、FMが生成したDSLベースの欠陥を検出し、修復することを可能にするフレームワークを導入している。 FM生成DSLにおける欠陥の頻度は,少なくとも1つの欠陥を含む研究事例の87.27%と高い。我々は、FM生成DSL用に特別に設計された最初の静的解析器であるTimonを開発し、検出された欠陥を修復するためのFMベースのツールであるPumbaaをガイドする。
参考スコア（独自算出の注目度）: 6.9197437493221186
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in Foundation Models (FMs) have demonstrated significant progress in comprehending complex natural language to perform intricate tasks. Successfully executing these tasks often requires orchestrating calls to FMs alongside other software components. However, manually decomposing a task into a coherent sequence of smaller, logically aggregated steps, commonly referred to as workflows, demands considerable effort and specialized domain knowledge. While FMs can assist in generating such workflows specified in domain-specific languages (DSLs), achieving accuracy and reliability in this process remains a challenge. This work introduces a framework that leverages static analysis feedback to enable FMs to detect and repair defects in the DSL-based workflows they generate. We begin by presenting the first-ever taxonomy of incidences of defects in FM-generated DSL workflows, categorizing them into 18 distinct types. Furthermore, we observe a high prevalence of defects across FM-generated DSL workflows, with 87.27% of the studied instances containing at least one defect. This, in turn, emphasizes the magnitude of the problem in practice and underscores the necessity for implementing mitigation strategies. Following this, we demonstrate that nine types of these defects can be effectively identified through static analysis of the workflows. For this purpose, we develop Timon, the first-of-its-kind static analyzer specifically designed for FM-generated DSL workflows. Finally, we show that by incorporating feedback from Timon, we can guide Pumbaa, an FM-based tool, to repair the detected defect incidences. By systematically detecting and repairing defects, our work provides a crucial step towards the reliable and automated generation of executable workflows from natural language requirements.
Abstract（参考訳）: 近年のファンデーションモデル(FM)の進歩は、複雑な自然言語を解釈して複雑なタスクを遂行する上で大きな進歩を見せている。これらのタスクをうまく実行するには、FMへの呼び出しを他のソフトウェアコンポーネントと一緒にオーケストレーションする必要があることが多い。しかしながら、手動でタスクを、ワークフローと呼ばれる、小さく論理的に集約されたステップの一貫性のあるシーケンスに分解するには、かなりの努力と専門的なドメイン知識が必要である。 FMはドメイン固有言語(DSL)で指定されたワークフローを生成するのに役立ちますが、このプロセスで正確さと信頼性を達成することは依然として課題です。この作業では、静的解析フィードバックを活用して、FMが生成したDSLベースのワークフローの欠陥を検出し、修復することを可能にするフレームワークを導入している。まず、FM生成DSLワークフローにおける欠陥の発生を初めて分類し、それらを18の異なるタイプに分類することから始めます。さらに、FM生成DSLワークフローにまたがる欠陥の頻度も高く、調査対象の87.27%には少なくとも1つの欠陥が含まれている。このことは、実際には問題の大きさを強調し、緩和戦略を実装する必要性を強調している。次に、ワークフローの静的解析により、これらの欠陥の9つのタイプを効果的に識別できることを実証する。この目的のために、FM生成DSLワークフロー用に特別に設計された、第1世代の静的アナライザであるTimonを開発した。最後に、Timonからのフィードバックを取り入れることで、FMベースのツールであるPumbaaをガイドして、検出された欠陥発生を修復できることを示す。欠陥を体系的に検出し、修復することにより、自然言語の要求から実行可能なワークフローを信頼できる自動生成するための重要なステップを提供します。

論文の概要: Towards Reliable Generation of Executable Workflows by Foundation Models

関連論文リスト