Fugu-MT 論文翻訳(概要): Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

論文の概要: Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.10395v1
Date: Wed, 11 Mar 2026 04:20:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.777109
Title: Graph-GRPO: Training Graph Flow Models with Reinforcement Learning
Title（参考訳）: Graph-GRPO:強化学習によるグラフフローモデルのトレーニング
Authors: Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang,
Abstract要約: グラフフローモデル(GFM)を学習するためのオンライン強化学習フレームワークであるGraph-GRPOを提案する。わずか50ステップで95.0%と97.5%のValid-Unique-Noveltyスコアが得られた。
参考スコア（独自算出の注目度）: 14.937302684130257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.
Abstract（参考訳）: グラフ生成は、薬物発見などの幅広い応用における基本的なタスクである。近年, グラフフローモデル (GFM) は, 優れた性能とフレキシブルサンプリングのため, 離散フローマッチングに基づくグラフ生成法である。しかしながら、GFMを複雑な人間の好みやタスク固有の目的と効果的に整合させることは、依然として重要な課題である。本稿では,オンライン強化学習(RL)フレームワークであるGraph-GRPOを提案する。我々は,(1)モンテカルロサンプリングを置き換え,RLトレーニングのための完全微分可能なロールアウトを可能にする解析式を導出し,(2)グラフ内の特定のノードやエッジをランダムに摂動し,それらを再生し,局所的な探索と生成品質の自己改善を可能にする改良戦略を提案する。合成データセットと実データセットの両方に対する大規模な実験は、Graph-GRPOの有効性を実証している。 95.0\% と 97.5\% の正解率を平面とツリーのデータセット上でそれぞれ達成する。さらに、Graph-GRPOは、従来の遺伝的アルゴリズムと同様に、グラフベースおよびフラグメントベースRL法よりも優れた、分子最適化タスクにおける最先端のパフォーマンスを達成する。

論文の概要: Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

関連論文リスト