Fugu-MT 論文翻訳(概要): Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

論文の概要: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

arxiv url: http://arxiv.org/abs/2106.04399v1
Date: Tue, 8 Jun 2021 14:21:10 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-09 22:52:42.701519
Title: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
Title（参考訳）: 非Iterative Diverse Candidate 生成のためのフローネットワークに基づく生成モデル
Authors: Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio
Abstract要約: 本稿では,アクションのシーケンスからオブジェクトを生成するためのポリシーを学習する問題について述べる。本稿では,生成過程をフローネットワークとして見たGFlowNetを提案する。提案した目的の任意のグローバルな最小限が、所望の分布から標本化する方針を導出することを証明する。
参考スコア（独自算出の注目度）: 110.09855163856326
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions, such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.
Abstract（参考訳）: 本稿では, 対象物を生成する確率が, 対象物に対して与えられた正の報酬に比例するように, 行動列から対象物を生成する確率的ポリシー(分子グラフなど)を学習する問題について述べる。標準的な戻り値の最大化は単一の戻り値最大化列に収束する傾向にあるが、様々な高戻り値の解をサンプリングしたい場合もある。例えば、ブラックボックスの関数最適化では、ラウンド数が少ない場合、それぞれが大きなクエリのバッチを持つ場合、例えば新しい分子の設計において、バッチは多様でなければならない。また、これをエネルギー関数を生成分布に近似変換する問題と見なすこともできる。 MCMC法はそれを実現することができるが、高価であり、一般的には局所探査のみを行う。代わりに、生成ポリシーのトレーニングは、トレーニング中の検索コストを償却し、迅速な生成へと導く。時間差分学習の知見を用いて、生成過程をフローネットワークとして見た上で、GFlowNetを提案する。これにより、異なる軌道が同じ最終状態(例えば、ある分子グラフを生成するために原子を逐次追加する方法)を扱いやすくする。本研究では, 流路の集合を流れとし, 流れの整合性方程式を学習対象に変換し, ベルマン方程式の時間差分法への鋳造と類似した。提案する目的のグローバルな最小限は、所望の分布から抽出したポリシーを導出し、報酬関数に多くのモードがある単純な領域において、GFlowNetの性能と多様性の向上を実証し、分子合成タスクで示す。

論文の概要: Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

関連論文リスト