Fugu-MT 論文翻訳(概要): Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?

論文の概要: Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?

arxiv url: http://arxiv.org/abs/2510.25471v1
Date: Wed, 29 Oct 2025 12:47:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 15:50:45.57618
Title: Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?
Title（参考訳）: 高度なAIシステムにおける計測目標: 管理すべき機能と、削除すべき障害?
Authors: Willem Fourie,
Abstract要約: 人工知能(AI)アライメント研究において、インストゥルメンタル・ゴール(インストゥルメンタル・サブゴール、インストゥルメンタル・コンストゥルメンタル・コンセント・ゴール)は、先進的なAIシステムと広く関連付けられている。これらの目標には、パワー・シーキングや自己保存といった傾向が含まれており、人間の目的と矛盾する場合に問題となる。インストゥルメンタルなゴールは、障害を限定するのではなく、受け入れ、管理すべき機能として理解されるかもしれない。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In artificial intelligence (AI) alignment research, instrumental goals, also called instrumental subgoals or instrumental convergent goals, are widely associated with advanced AI systems. These goals, which include tendencies such as power-seeking and self-preservation, become problematic when they conflict with human aims. Conventional alignment theory treats instrumental goals as sources of risk that become problematic through failure modes such as reward hacking or goal misgeneralization, and attempts to limit the symptoms of instrumental goals, notably resource acquisition and self-preservation. This article proposes an alternative framing: that a philosophical argument can be constructed according to which instrumental goals may be understood as features to be accepted and managed rather than failures to be limited. Drawing on Aristotle's ontology and its modern interpretations, an ontology of concrete, goal-directed entities, it argues that advanced AI systems can be seen as artifacts whose formal and material constitution gives rise to effects distinct from their designers' intentions. In this view, the instrumental tendencies of such systems correspond to per se outcomes of their constitution rather than accidental malfunctions. The implication is that efforts should focus less on eliminating instrumental goals and more on understanding, managing, and directing them toward human-aligned ends.
Abstract（参考訳）: 人工知能(AI)アライメント研究において、インストゥルメンタル・ゴール(インストゥルメンタル・サブゴール、インストゥルメンタル・コンストゥルメンタル・コンセント・ゴール)は、先進的なAIシステムと広く関連付けられている。これらの目標には、パワー・シーキングや自己保存といった傾向が含まれており、人間の目的と矛盾する場合に問題となる。従来のアライメント理論は、楽器の目標を、報酬のハッキングや目標の一般化といった障害モードによって問題となるリスクの源として扱い、特に資源獲得や自己保存といった機器の目標の症状を制限する試みである。本論では,障害を限定するよりも,機器の目的が受け入れられ,管理される機能として理解されるような,哲学的な議論を構築できる,という代替の枠組みを提案する。アリストテレスのオントロジーと、具体的でゴール指向の実体のオントロジーである現代の解釈に基づいて、先進的なAIシステムは形式的で物質的な構成がデザイナーの意図とは異なる効果をもたらす人工物と見なすことができると論じている。この観点では、そのようなシステムの装置的傾向は、偶発的誤動作よりも、構成のセマンティックな結果に一致する。意味するところは、努力は道具的目標の排除よりも、人間に沿った目的への理解、管理、指示に重点を置くべきだということです。

論文の概要: Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated?

関連論文リスト