Fugu-MT 論文翻訳(概要): Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

論文の概要: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

arxiv url: http://arxiv.org/abs/2401.08632v1
Date: Sun, 10 Dec 2023 19:53:15 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-22 09:51:18.355073
Title: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Title（参考訳）: Descriptor-Conditioned Reinforcement Learning による品質多様性の相乗化
Authors: Maxence Faldor, F\'elix Chalumeau, Manon Flageat, Antoine Cully
Abstract要約: 品質多様性最適化(Quality-Diversity Optimization)は進化的アルゴリズムの一種で、多種多様な高性能なソリューションのコレクションを生成する。 MAP-Elitesは、進化ロボティクスを含む様々な分野に適用された顕著な例である。本研究は,(1)記述者条件付き評論家による多様度探索と勾配に基づく手法との整合性を考慮した政策グラディエント変動演算子を強化すること,(2)追加費用なしで記述者条件付き政策を学習するためにアクタ批判的訓練を活用すること,の3つの貢献を提示する。
参考スコア（独自算出の注目度）: 4.787389127632926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.
Abstract（参考訳）: インテリジェンスの基本的特徴は、与えられた課題に対処したり、予期せぬ状況に適応するために、斬新で創造的な解決策を見つけることである。このことを反映して、Quality-Diversityの最適化は進化的アルゴリズムのファミリーであり、多種多様な高性能なソリューションのコレクションを生成する。これらの中、map-elitesは進化ロボティクスを含む様々な分野にうまく適用された顕著な例である。しかし、MAP-Elitesは遺伝的アルゴリズムから派生したランダムな突然変異を持つ分岐探索を行い、低次元解の進化する集団に限られる。 pga-map-elitesはこの制限を、大規模ニューラルネットワークの進化を可能にする深層強化学習にインスパイアされた勾配ベースの変分演算子を用いて克服する。多くの環境で高い性能を示すが、PGA-MAP-Elitesは勾配に基づく変動作用素の収束探索が多様性を妨げるいくつかのタスクで失敗する。 In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. 提案手法であるDCG-MAP-Elitesは、7つの困難な連続制御ロコモーションタスクのベースラインと同等以上のQDスコアとカバレッジを達成する。

関連論文リスト

AlphaEvolve: A coding agent for scientific and algorithmic discovery [63.13852052551106]
我々は,最先端LLMの能力を大幅に向上させる進化的符号化エージェントAlphaEvolveを提案する。 AlphaEvolveはLLMの自律パイプラインを編成し、そのタスクはコードを直接変更することでアルゴリズムを改善することである。本稿では,多くの重要な計算問題に適用することで,このアプローチの広範な適用性を実証する。
論文参考訳（メタデータ） (2025-06-16T06:37:18Z)
Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization [25.633698252033756]
本稿では,DRLの学習効率をGAのグローバル検索能力と相乗化するための進化的拡張機構を提案する。 EAMは、学習されたポリシーからソリューションを生成し、クロスオーバーや突然変異といったドメイン固有の遺伝子操作によってそれらを精製することで機能する。 EAMは、アテンションモデル、POMO、SymNCOのような最先端のDRLソルバとシームレスに統合できる。
論文参考訳（メタデータ） (2025-06-11T05:17:30Z)
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
本稿では,ER-APTと呼ばれる進化型領域逆アプティブチューニング手法を提案する。各トレーニングイテレーションでは、まず従来の勾配法を用いてAEを生成する。次に、AEsを最適化するために、選択、突然変異、交差を含む遺伝的進化機構を適用する。最終進化型AEは、従来の単点対向的な高速チューニングの代わりに、地域ベースの対向最適化を実現するために用いられる。
論文参考訳（メタデータ） (2025-03-17T07:08:47Z)
Exploring the Generalization Capabilities of AID-based Bi-level Optimization [50.3142765099442]
本稿では, 近似暗黙差分法 (AID) と反復差分法 (D) の2種類の二段階最適化手法を提案する。 AIDベースのメソッドは容易に変換できないが、2レベル構造に留まる必要がある。実世界のタスクにおけるこれらの手法の有効性と応用の可能性を示す。
論文参考訳（メタデータ） (2024-11-25T04:22:17Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
本稿では,既存のMTL手法の効率性に焦点をあてる。バックボーンを小さくしたメソッドの大規模な実験と,MetaGraspNetデータセットを新しいテストグラウンドとして実施する。また,MTLにおける課題の新規かつ効率的な識別子として,特徴分散尺度を提案する。
論文参考訳（メタデータ） (2024-02-05T22:15:55Z)
GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model [69.71629949747884]
GAN(Generative Adversarial Networks)のような逆生成モデルは、様々な種類のデータを生成するために広く応用されている。本研究では, GE-AdvGAN という新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2024-01-11T16:43:16Z)
Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and Research Opportunities [63.258517066104446]
進化的アルゴリズムの構成要素として統合された強化学習は,近年,優れた性能を示している。本稿では,RL-EA 統合手法,RL-EA が採用する RL-EA 支援戦略,および既存文献による適用について論じる。 RL-EAセクションの適用例では、RL-EAのいくつかのベンチマークおよび様々な公開データセットにおける優れた性能を示す。
論文参考訳（メタデータ） (2023-08-25T15:06:05Z)
A Reinforcement Learning-assisted Genetic Programming Algorithm for Team Formation Problem Considering Person-Job Matching [70.28786574064694]
解の質を高めるために強化学習支援遺伝的プログラミングアルゴリズム(RL-GP)を提案する。効率的な学習を通じて得られる超ヒューリスティックなルールは、プロジェクトチームを形成する際の意思決定支援として利用することができる。
論文参考訳（メタデータ） (2023-04-08T14:32:12Z)
MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy [1.376408511310322]
DCG-MAP-ElitesはPGA-MAP-ElitesのQDスコアを平均82%改善する。我々のアルゴリズムであるDCG-MAP-ElitesはPGA-MAP-ElitesのQDスコアを平均82%改善する。
論文参考訳（メタデータ） (2023-03-07T11:58:01Z)
Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains [1.376408511310322]
PGA-MAP-Elitesは決定論的および不確実な高次元環境において高い性能を示す。 PGA-MAP-Elitesによって生成される解の集合は、考慮されたベースラインを全て上回るだけでなく、不確実な環境では非常に再現性が高い。
論文参考訳（メタデータ） (2022-10-24T12:17:18Z)
Self-Referential Quality Diversity Through Differential Map-Elites [5.2508303190856624]
Differential MAP-Elitesは、計算-MAP-Elitesの照明能力と微分進化の連続空間最適化能力を組み合わせた新しいアルゴリズムである。ここで初めて導入されたMAP-Elitesアルゴリズムは、微分進化の演算子と微分-MAP-Elitesの写像構造を単純に組み合わせることで比較的単純である。
論文参考訳（メタデータ） (2021-07-11T04:31:10Z)
Adam revisited: a weighted past gradients perspective [57.54752290924522]
本稿では,非収束問題に取り組むための適応法重み付け適応アルゴリズム(wada)を提案する。私たちは、WADAが重み付きデータ依存の後悔境界を達成できることを証明します。
論文参考訳（メタデータ） (2021-01-01T14:01:52Z)
Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations [1.827510863075184]
我々は,Phenotype Elites (MAP-Elites) の多次元アーカイブが,最先端のRL手法よりも優れた性能を実現することを示す。本稿では、EAと現代の計算資源を組み合わせることで、有望な特性を示すことを示す。
論文参考訳（メタデータ） (2020-09-17T17:41:46Z)
Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters [1.827510863075184]
CMA-MEを直接拡張し、その品質、多様性、データ効率を改善するアルゴリズムであるMulti-Emitter MAP-Elites(ME-MAP-Elites)を導入する。バンディットアルゴリズムは、現在の状況に応じて最適なエミッタの選択を動的に見つける。我々は,標準的な最適化問題(100次元)からロボット工学における複雑な移動タスクまで,6つのタスクにおけるME-MAP-Elitesの性能を評価する。
論文参考訳（メタデータ） (2020-07-10T12:45:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。