Fugu-MT 論文翻訳(概要): Better Training of GFlowNets with Local Credit and Incomplete Trajectories

論文の概要: Better Training of GFlowNets with Local Credit and Incomplete Trajectories

arxiv url: http://arxiv.org/abs/2302.01687v1
Date: Fri, 3 Feb 2023 12:19:42 GMT
ステータス: 翻訳完了
システム内更新日: 2023-02-06 16:27:38.433102
Title: Better Training of GFlowNets with Local Credit and Incomplete Trajectories
Title（参考訳）: ローカルクレジットと不完全軌道を用いたGFlowNetsのより良いトレーニング
Authors: Ling Pan, Nikolay Malkin, Dinghuai Zhang, Yoshua Bengio
Abstract要約: エネルギー関数が終端状態だけでなく中間状態にも適用できる場合を考える。これは例えば、エネルギー関数が加法的であるときに達成され、軌道に沿って項が利用できる。これにより、不完全なトラジェクトリであってもパラメータの更新に適用可能なトレーニングの目標が可能になる。
参考スコア（独自算出の注目度）: 81.14310509871935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). They are trained to generate an object $x$ through a sequence of steps with probability proportional to some reward function $R(x)$ (or $\exp(-\mathcal{E}(x))$ with $\mathcal{E}(x)$ denoting the energy function), given at the end of the generative trajectory. Like for other RL settings where the reward is only given at the end, the efficiency of training and credit assignment may suffer when those trajectories are longer. With previous GFlowNet work, no learning was possible from incomplete trajectories (lacking a terminal state and the computation of the associated reward). In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states. This is for example achieved when the energy function is additive, with terms available along the trajectory. We show how to reparameterize the GFlowNet state flow function to take advantage of the partial reward already accrued at each state. This enables a training objective that can be applied to update parameters even with incomplete trajectories. Even when complete trajectories are available, being able to obtain more localized credit and gradients is found to speed up training convergence, as demonstrated across many simulations.
Abstract（参考訳）: Generative Flow Networks or GFlowNets are related to Monte-Carlo Markov chain methods (as they sample from a distribution specified by an energy function), reinforcement learning (as they learn a policy to sample composed objects through a sequence of steps), generative models (as they learn to represent and sample from a distribution) and amortized variational methods (as they can be used to learn to approximate and sample from an otherwise intractable posterior, given a prior and a likelihood). それらは、生成軌道の最後に与えられる、いくつかの報酬関数 $r(x)$ (または $\exp(-\mathcal{e}(x))$ with $\mathcal{e}(x)$ に比例する確率を持つ一連のステップを通じて、オブジェクト $x$を生成するように訓練される。最終的に報酬が与えられる他のRL設定と同様に、トレーニングとクレジットの割り当ての効率は、これらの軌道が長くなると損なわれる可能性がある。従来のgflownetでは,不完全なトラジェクタ(終端状態と関連する報酬の計算)からの学習は不可能だった。本稿では, 終端状態だけでなく, 中間状態にもエネルギー関数が適用可能であることを考察する。これは例えば、エネルギー関数が加法的であるときに達成され、軌道に沿って項が利用できる。我々は、GFlowNet状態フロー関数を再パラメータ化して、各状態で既に獲得した部分的な報酬を利用する方法を示す。これにより、不完全なトラジェクトリであってもパラメータの更新に適用可能なトレーニングの目標が可能になる。完全な軌道が利用可能である場合でも、多くのシミュレーションで示されているように、より局所化されたクレジットと勾配を得ることができることはトレーニング収束をスピードアップさせる。

論文の概要: Better Training of GFlowNets with Local Credit and Incomplete Trajectories

関連論文リスト