Fugu-MT 論文翻訳(概要): RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

論文の概要: RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

arxiv url: http://arxiv.org/abs/2306.03530v2
Date: Tue, 14 Nov 2023 20:35:09 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-16 20:10:39.570080
Title: RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
Title（参考訳）: RLtools: 継続的制御のための高速でポータブルなディープ強化学習ライブラリ
Authors: Jonas Eschmann, Dario Albani, Giuseppe Loianno
Abstract要約: RLtoolsは依存性のない、ヘッダのみの純粋なC++ライブラリで、深い教師付きと強化学習のためのライブラリである。 RLtoolsは、Pendulum-v1のような一般的なRL問題を解決することができる。我々の知る限り、RLtoolsはマイクロコントローラ上でディープRLアルゴリズムを直接トレーニングする最初のデモンストレーションを可能にします。
参考スコア（独自算出の注目度）: 8.159171440455824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Reinforcement Learning (RL) has been demonstrated to yield capable agents and control policies in several domains but is commonly plagued by prohibitively long training times. Additionally, in the case of continuous control problems, the applicability of learned policies on real-world embedded devices is limited due to the lack of real-time guarantees and portability of existing deep learning libraries. To address these challenges, we present RLtools, a dependency-free, header-only, pure C++ library for deep supervised and reinforcement learning. Leveraging the template meta-programming capabilities of recent C++ standards, we provide composable components that can be tightly integrated by the compiler. Its novel architecture allows RLtools to be used seamlessly on a heterogeneous set of platforms, from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers. Specifically, due to the tight integration of the RL algorithms with simulation environments, RLtools can solve popular RL problems like the Pendulum-v1 swing-up about 7 to 15 times faster in terms of wall-clock training time compared to other popular RL frameworks when using TD3. We also provide a low-overhead and parallelized interface to the MuJoCo simulator, showing that our PPO implementation achieves state of the art returns in the Ant-v4 environment while being 25%-30% faster in terms of wall-clock training time. Finally, we also benchmark the policy inference on a diverse set of microcontrollers and show that in most cases our optimized inference implementation is much faster than even the manufacturer's DSP libraries. To the best of our knowledge, RLtools enables the first-ever demonstration of training a deep RL algorithm directly on a microcontroller, giving rise to the field of TinyRL. The source code is available through our project page at https://rl.tools.
Abstract（参考訳）: 深層強化学習(Deep Reinforcement Learning, RL)は、いくつかのドメインで有能なエージェントとコントロールポリシーを付与することが実証されているが、一般的には、非常に長い訓練時間によって悩まされている。さらに、継続的制御問題の場合、既存のディープラーニングライブラリのリアルタイム保証やポータビリティの欠如により、実世界の組み込みデバイスに対する学習ポリシーの適用性が制限される。これらの課題に対処するため、我々は依存性のない、ヘッダのみの純粋なC++ライブラリであるRLtoolsを紹介した。最近のc++標準のテンプレートメタプログラミング機能を活用することで、コンパイラによって強く統合できる構成可能なコンポーネントを提供します。その新しいアーキテクチャは、ワークステーションやラップトップ上のHPCクラスタからスマートフォン、スマートウォッチ、マイクロコントローラに至るまで、RLtoolを多種多様なプラットフォーム上でシームレスに使用できる。具体的には、シミュレーション環境とのRLアルゴリズムの密接な統合により、RLtoolsは、TD3を使用する場合の他の一般的なRLフレームワークと比較して、ウォールクロックトレーニング時間の約7～15倍の速度でPendulum-v1のスウィングアップのような一般的なRL問題を解決することができる。また,MuJoCoシミュレータの低オーバヘッド・並列化インタフェースを提供し,PPO実装がAnt-v4環境におけるアートリターンの状態を達成し,ウォールクロックのトレーニング時間において25%～30%高速であることを示す。最後に、様々なマイクロコントローラのポリシー推論をベンチマークし、ほとんどの場合、最適化された推論実装はメーカーのDSPライブラリよりもはるかに高速であることを示す。我々の知る限り、RLtoolsはマイクロコントローラ上でディープRLアルゴリズムを直接トレーニングする最初のデモンストレーションを可能にし、TinyRLの分野を生み出します。ソースコードは、https://rl.tools.orgのプロジェクトページから入手できます。

論文の概要: RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

関連論文リスト