Fugu-MT 論文翻訳(概要): Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

論文の概要: Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

arxiv url: http://arxiv.org/abs/2604.00977v1
Date: Wed, 01 Apr 2026 14:47:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:32.039938
Title: Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization
Title（参考訳）: 軌道最適化における分散強化学習を用いたフローベース政策
Authors: Ruijie Hao, Longfei Zhang, Yang Dai, Yang Ma, Xingxing Liang, Guangquan Cheng,
Abstract要約: 強化学習(Reinforcement Learning, RL)は、複雑な制御や意思決定タスクに対処する上で非常に効果的であることが証明されている。分散RL(FP-DRL)を用いたフローベースポリシというRLアルゴリズムを提案する。このアルゴリズムは、計算効率と複雑な分布に適合する能力の両方を提供するフローマッチングを用いてポリシーをモデル化する。分散RLアプローチを用いてリターン分布全体をモデル化し、最適化することにより、マルチモーダルポリシー更新をより効果的に導出し、エージェント性能を向上させる。
参考スコア（独自算出の注目度）: 8.371088557371236
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient guidance for policy updates. In response to these problems, we propose a RL algorithm termed flow-based policy with distributional RL (FP-DRL). This algorithm models the policy using flow matching, which offers both computational efficiency and the capacity to fit complex distributions. Additionally, it employs a distributional RL approach to model and optimize the entire return distribution, thereby more effectively guiding multimodal policy updates and improving agent performance. Experimental trails on MuJoCo benchmarks demonstrate that the FP-DRL algorithm achieves state-of-the-art (SOTA) performance in most MuJoCo control tasks while exhibiting superior representation capability of the flow policy.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は、複雑な制御や意思決定タスクに対処する上で非常に効果的であることが証明されている。しかし、ほとんどの従来のRLアルゴリズムでは、ポリシーは対角的なガウス分布としてパラメータ化され、マルチモーダル分布の取得を制限し、マルチモーダル分布における最適解の全範囲をカバーすることは困難であり、リターンは平均値に還元され、マルチモーダルの性質が失われ、ポリシー更新のための十分なガイダンスが提供されない。これらの問題に対応するために,分布RL(FP-DRL)を用いたフローベースポリシーと呼ばれるRLアルゴリズムを提案する。このアルゴリズムは、計算効率と複雑な分布に適合する能力の両方を提供するフローマッチングを用いてポリシーをモデル化する。さらに、リターン分布全体をモデル化し最適化するために分散RLアプローチを採用し、マルチモーダルポリシー更新をより効果的に誘導し、エージェント性能を向上させる。 MuJoCo のベンチマーク実験では,FP-DRL アルゴリズムはフローポリシの優れた表現能力を示しながら,ほとんどの MuJoCo 制御タスクにおいて最先端 (SOTA) 性能を実現することが示されている。

論文の概要: Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

関連論文リスト