Fugu-MT 論文翻訳(概要): Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

論文の概要: Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images

arxiv url: http://arxiv.org/abs/2203.01810v1
Date: Wed, 2 Mar 2022 14:39:17 GMT
ステータス: 翻訳完了
システム内更新日: 2022-03-05 12:51:11.979969
Title: Integrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images
Title（参考訳）: 画像からの強化学習のためのコントラスト学習と動的モデルの統合
Authors: Bang You, Oleg Arenz, Youping Chen, Jan Peters
Abstract要約: 我々は、学習した埋め込みのマルコビアン性を明確に改善することが望ましいと論じている。コントラスト学習と動的モデルを統合する自己教師付き表現学習法を提案する。
参考スコア（独自算出の注目度）: 31.413588478694496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent methods for reinforcement learning from images use auxiliary tasks to learn image features that are used by the agent's policy or Q-function. In particular, methods based on contrastive learning that induce linearity of the latent dynamics or invariance to data augmentation have been shown to greatly improve the sample efficiency of the reinforcement learning algorithm and the generalizability of the learned embedding. We further argue, that explicitly improving Markovianity of the learned embedding is desirable and propose a self-supervised representation learning method which integrates contrastive learning with dynamic models to synergistically combine these three objectives: (1) We maximize the InfoNCE bound on the mutual information between the state- and action-embedding and the embedding of the next state to induce a linearly predictive embedding without explicitly learning a linear transition model, (2) we further improve Markovianity of the learned embedding by explicitly learning a non-linear transition model using regression, and (3) we maximize the mutual information between the two nonlinear predictions of the next embeddings based on the current action and two independent augmentations of the current state, which naturally induces transformation invariance not only for the state embedding, but also for the nonlinear transition model. Experimental evaluation on the Deepmind control suite shows that our proposed method achieves higher sample efficiency and better generalization than state-of-art methods based on contrastive learning or reconstruction.
Abstract（参考訳）: 画像からの強化学習法では,エージェントのポリシーやQ-関数で使用される画像特徴を補助的タスクで学習する。特に,潜在ダイナミクスの線形性やデータ拡張の不変性を誘発するコントラスト学習に基づく手法は,強化学習アルゴリズムのサンプル効率と学習埋め込みの一般化性を大幅に改善することが示されている。 We further argue, that explicitly improving Markovianity of the learned embedding is desirable and propose a self-supervised representation learning method which integrates contrastive learning with dynamic models to synergistically combine these three objectives: (1) We maximize the InfoNCE bound on the mutual information between the stateand action-embedding and the embedding of the next state to induce a linearly predictive embedding without explicitly learning a linear transition model, (2) we further improve Markovianity of the learned embedding by explicitly learning a non-linear transition model using regression, and (3) we maximize the mutual information between the two nonlinear predictions of the next embeddings based on the current action and two independent augmentations of the current state, which naturally induces transformation invariance not only for the state embedding, but also for the nonlinear transition model. Deepmind 制御スイートの実験により,提案手法は,比較学習や再構成に基づく最先端手法よりも高いサンプリング効率と優れた一般化を実現することが示された。

関連論文リスト

Enhancing knowledge retention for continual learning with domain-specific adapters and features gating [4.637185817866919]
継続的な学習は、以前に取得した知識を保持しながら、連続したデータのストリームから学習するモデルに力を与える。本稿では,視覚変換器の自己保持機構にアダプタを組み込むことにより,異なるドメインからのデータセットを逐次追加する場合の知識保持を向上させる手法を提案する。
論文参考訳（メタデータ） (2025-04-11T15:20:08Z)
Differentiable Information Enhanced Model-Based Reinforcement Learning [48.820039382764]
差別化可能な環境は、豊かな差別化可能な情報を提供することで、コントロールポリシーを学習する新たな可能性を秘めている。モデルベース強化学習(MBRL)法は、基礎となる物理力学を回復するために、識別可能な情報のパワーを効果的に活用する可能性を示す。しかし,2つの主要な課題は,1)より高精度な動的予測モデルの構築と,2)政策訓練の安定性の向上である。
論文参考訳（メタデータ） (2025-03-03T04:51:40Z)
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction [3.7003845808210594]
本稿では,拡張型自己教師型学習アルゴリズムにおいて,画像再構成タスクを補助的コンポーネントとして統合することを提案する。提案手法は,2つの拡張ビューから学習した特徴をブレンドし,そのうちの1つを再構築するクロスアテンション機構を実装した。結果は、標準強化に基づく自己教師あり学習法よりも大幅に改善されている。
論文参考訳（メタデータ） (2024-12-04T13:47:37Z)
Robustness Reprogramming for Representation Learning [18.466637575445024]
十分に訓練されたディープラーニングモデルを考えると、パラメータを変更することなく、対向的あるいはノイズの多い入力摂動に対する堅牢性を高めるために再プログラムできるだろうか? 本稿では,新しい非線形ロバストパターンマッチング手法を提案する。
論文参考訳（メタデータ） (2024-10-06T18:19:02Z)
Novel Saliency Analysis for the Forward Forward Algorithm [0.0]
ニューラルネットワークトレーニングにフォワードフォワードアルゴリズムを導入する。この方法は、2つのフォワードパスを実際のデータで実行し、正の強化を促進する。従来のサリエンシ手法に固有の制約を克服するため,フォワードフォワードフレームワークに特化してベスポークサリエンシアルゴリズムを開発した。
論文参考訳（メタデータ） (2024-09-18T17:21:59Z)
MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning [8.61492882526007]
視覚的強化学習(RL)では、ピクセルベースの観察から学ぶことは、サンプル効率に大きな課題をもたらす。グラフベースの時空間マスキングの助けを借りて時間的コントラストの目的を生かした新しいフレームワークMOOSSを紹介する。複数の連続的および離散的な制御ベンチマークにおいて、MOOSSはサンプル効率の観点から従来の最先端の視覚的RL法よりも優れていたことを示す。
論文参考訳（メタデータ） (2024-09-02T18:57:53Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
モデルに基づく強化学習のサンプル効率を改善するために、潜在変数モデルが学習、計画、探索をいかに促進するかは理論上、実証上、不明である。状態-作用値関数に対する潜在変数モデルの表現ビューを提供する。これは、抽出可能な変分学習アルゴリズムと楽観主義/悲観主義の原理の効果的な実装の両方を可能にする。特に,潜伏変数モデルのカーネル埋め込みを組み込んだUPB探索を用いた計算効率の良い計画アルゴリズムを提案する。
論文参考訳（メタデータ） (2022-12-17T00:26:31Z)
Imposing Consistency for Optical Flow Estimation [73.53204596544472]
プロキシタスクによる一貫性の導入は、データ駆動学習を強化することが示されている。本稿では,光フロー推定のための新しい,効果的な整合性戦略を提案する。
論文参考訳（メタデータ） (2022-04-14T22:58:30Z)
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism [65.46524775457928]
オフライン強化学習は、オフライン/歴史的データを活用して、シーケンシャルな意思決定戦略を最適化しようとしている。線形モデル表現を用いたオフライン強化学習の統計的限界について検討する。
論文参考訳（メタデータ） (2022-03-11T09:00:12Z)
Refining Self-Supervised Learning in Imaging: Beyond Linear Metric [25.96406219707398]
本稿では,ジャカード類似度尺度を測度に基づく計量として活用する,新しい統計的視点を紹介する。具体的には、提案した計量は、いわゆる潜在表現から得られた2つの適応射影間の依存度として解釈できる。我々の知る限りでは、この事実上非線形に融合した情報は、Jaccardの類似性に埋め込まれており、将来有望な結果を伴う自己超越学習に新しいものである。
論文参考訳（メタデータ） (2022-02-25T19:25:05Z)
Consistency and Monotonicity Regularization for Neural Knowledge Tracing [50.92661409499299]
人間の知識獲得を追跡する知識追跡(KT)は、オンライン学習と教育におけるAIの中心的なコンポーネントです。本稿では, 新たなデータ拡張, 代替, 挿入, 削除の3種類と, 対応する正規化損失を提案する。さまざまなKTベンチマークに関する広範な実験は、私たちの正規化スキームがモデルのパフォーマンスを一貫して改善することを示しています。
論文参考訳（メタデータ） (2021-05-03T02:36:29Z)
Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning [0.0]
本研究では,モデルフリー学習者を支援するために,ダイナミックスからの目標勾配情報を活用することで,サンプル効率を向上させる勾配マッチングアルゴリズムを提案する。本稿では,モデルに基づく学習者からの勾配情報と,抽象的な低次元空間におけるモデル自由成分とをマッチングする手法を提案する。
論文参考訳（メタデータ） (2020-05-28T05:02:47Z)
Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
現在のモデルに基づく強化学習アプローチでは、単に学習されたブラックボックスシミュレータとしてモデルを使用する。その微分可能性を利用してモデルをより効果的に活用する方法を示す。
論文参考訳（メタデータ） (2020-05-16T19:18:10Z)
Guided Variational Autoencoder for Disentanglement Learning [79.02010588207416]
本稿では,潜在表現非絡み合い学習を行うことで,制御可能な生成モデルを学習できるアルゴリズム,Guided-VAEを提案する。我々は、ガイド-VAEにおける教師なし戦略と教師なし戦略を設計し、バニラVAE上でのモデリングと制御能力の強化を観察する。
論文参考訳（メタデータ） (2020-04-02T20:49:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。