Fugu-MT 論文翻訳(概要): Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

論文の概要: Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

arxiv url: http://arxiv.org/abs/2604.17502v1
Date: Sun, 19 Apr 2026 15:43:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.558789
Title: Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
Title（参考訳）: シャットダウン可能なエージェントを目指して:RLエージェントとLLMの確率的選択を一般化する
Authors: Carissa Cullen, Harry Garland, Alexander Roman, Louis Thomson, Christos Ziakas, Elliott Thornley,
Abstract要約: 我々は、DREST(Discounted Reward for Same-Length Trajectories)を用いて、異なる長さの軌道間の好みを欠くようにエージェントを訓練する。その結果, DreST RL は, ベースラインよりも11% (PPO) と18% (A2C) が有効であることが判明した。
参考スコア（独自算出の注目度）: 34.04300270586953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Misaligned artificial agents might resist shutdown. One proposed solution is to train agents to lack preferences between different-length trajectories. The Discounted Reward for Same-Length Trajectories (DReST) reward function does this by penalizing agents for repeatedly choosing same-length trajectories, and thus incentivizes agents to (1) choose stochastically between different trajectory-lengths (be Neutral about trajectory-lengths), and (2) pursue goals effectively conditional on each trajectory-length (be Useful). In this paper, we use DReST to train deep RL agents and fine-tune LLMs to be Neutral and Useful. We find that these DReST agents generalize to being Neutral and Useful in unseen contexts at test time. Indeed, DReST RL agents achieve 11% (PPO) and 18% (A2C) higher Usefulness on our test set than baseline agents, and our fine-tuned LLM achieves maximum Usefulness and near-maximum Neutrality. Our results provide some early evidence that DReST could be used to train more advanced agents to be Useful and Neutral. Prior theoretical work suggests that these agents would be useful and shutdownable.
Abstract（参考訳）: ミスアライメントされた人工エージェントはシャットダウンに抵抗するかもしれない。提案された解決策の1つは、異なる長さの軌道間の嗜好を欠くようにエージェントを訓練することである。 DREST(Discounted Reward for Same-Length Trajectories)報酬関数は、同じ長さの軌跡を何度も選択するためのエージェントをペナライズすることでこれを行ないます。本稿では,DReSTを用いて深部RLエージェントと微調整LDMをニュートラルで有用であるように訓練する。これらのDReSTエージェントは、テスト時に目に見えない文脈で中性かつ有用であることに一般化されている。実際, DREST RL 剤は, ベースライン剤よりも11% (PPO) と18% (A2C) が有効であり, 我々の微調整 LLM は最大有用性とほぼ最大中立性を達成する。我々の結果は、DReSTがより高度なエージェントを有効で中立的に訓練するのに使えるという初期の証拠を提示する。以前の理論的研究は、これらのエージェントが有用であり、シャットダウン可能であることを示唆していた。

論文の概要: Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

関連論文リスト