Fugu-MT 論文翻訳(概要): FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction

論文の概要: FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction

arxiv url: http://arxiv.org/abs/2606.24679v1
Date: Tue, 23 Jun 2026 15:10:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:49.023821
Title: FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction
Title（参考訳）: FlowPipe:データ準備パイプライン構築のためのLLM強化条件生成フローネットワーク
Authors: Kunyu Ni, Lei Cao, Jie He, Xiaotong Zhang, Jianfeng Jin, Junyu Dong, Yanwei Yu,
Abstract要約: データ準備パイプラインは、生のテーブルを学習可能なデータに変換することによって、機械学習におけるデータ品質を改善する。既存の最先端(SOTA)マルチDQNメソッドは3つの重要な制限に直面している。有向非巡回グラフ上での条件付き確率フロー生成としてパイプライン合成を定式化する統合フレームワークであるFlowPipeを提案する。
参考スコア（独自算出の注目度）: 43.791981476558384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data preparation pipelines improve data quality in machine learning by transforming raw tables into learning-ready data through sequential cleaning and feature transformation operators. However, automatically constructing such pipelines is computationally difficult because operator sequences are combinatorial and end-to-end evaluation is expensive. Existing state-of-the-art (SOTA) Multi-DQN methods still face three key limitations: decoupled value estimators weaken long-horizon credit assignment, dataset context is only weakly injected into the policy, and exploration is inefficient in a sparse search space with many invalid states. To address these issues, we propose FlowPipe, a unified framework that formulates pipeline synthesis as conditional probabilistic flow generation over a directed acyclic graph. FlowPipe uses Conditional Generative Flow Networks (C-GFlowNets) with a Trajectory Balance objective to connect terminal validation rewards with early pipeline decisions. It further introduces Deep Semantic Modulation through Feature-wise Linear Modulation (FiLM), allowing LLM-derived logical priors to condition the policy's internal activations according to dataset semantics. In addition, FlowPipe incorporates failure awareness into the flow objective to avoid invalid states and concentrate search on high-potential regions. Experiments on two benchmark suites with 74 real-world datasets show that FlowPipe outperforms SOTA baselines, improving accuracy by 11.96% on average and achieving 12.5x faster training convergence. Source code is available at https://github.com/KunyuNi/FlowPipe.
Abstract（参考訳）: データ準備パイプラインは、シーケンシャルなクリーニングと特徴変換演算子を通じて、生のテーブルを学習可能なデータに変換することによって、機械学習におけるデータ品質を改善する。しかし、演算子列は組合せ的であり、エンドツーエンド評価は高価であるため、そのようなパイプラインの自動構築は計算的に困難である。既存のSOTA(State-of-the-art) 既存のマルチDQN手法は、3つの重要な制限に直面している: 分離された値推定器は長期水平クレジット割り当てを弱め、データセットコンテキストはポリシーに弱められ、多くの無効な状態を持つスパース検索空間では探索が非効率的である。これらの問題に対処するために、パイプライン合成を有向非巡回グラフ上の条件付き確率的フロー生成として定式化する統合フレームワークFlowPipeを提案する。 FlowPipeは、Conditional Generative Flow Networks(C-GFlowNets)とTrjectory Balanceの目標を使用して、端末バリデーション報酬と早期パイプライン決定を接続する。さらに、FiLM(Feature-wise Linear Modulation)によるDeep Semantic Modulationを導入し、データセマンティクスに従ってポリシーの内部アクティベーションを条件付けるLLM由来の論理的事前処理を可能にする。さらに、FlowPipeは障害認識をフロー目標に組み込んで、無効な状態を避け、高潜在領域の検索に集中させる。 74の実世界のデータセットを持つ2つのベンチマークスイートの実験では、FlowPipeはSOTAベースラインを上回り、平均で11.96%の精度向上と12.5倍高速なトレーニング収束を実現している。ソースコードはhttps://github.com/KunyuNi/FlowPipe.comで入手できる。

論文の概要: FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction

関連論文リスト