Fugu-MT 論文翻訳(概要): Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

論文の概要: Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

arxiv url: http://arxiv.org/abs/2603.11757v1
Date: Thu, 12 Mar 2026 10:04:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.010218
Title: Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach
Title（参考訳）: ソーシャル・バンド・ラーニングにおける非専門的・異種エージェントのエクスプロイト・スペシャリスト:自由エネルギー的アプローチ
Authors: Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi,
Abstract要約: 社会的学習は個人の経験と他人の行動を観察し、学習成果を改善する機会を提示する。本稿では,社会規範に頼らずに,社会エージェントが他人の専門知識を評価できる,自由エネルギーに基づくソーシャル・バンディット学習アルゴリズムを提案する。提案アルゴリズムは,ランダムエージェントや準最適エージェントの存在下においても,関連するエージェントを戦略的に識別し,その行動情報を巧みに活用する。
参考スコア（独自算出の注目度）: 3.1197794117254074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent integrates its direct experiences in the environment and others' estimated policies. The theoretical convergence of our algorithm to the optimal policy is proven. Empirical evaluations validate the superiority of our social learning method over alternative approaches in various scenarios. Our algorithm strategically identifies the relevant agents, even in the presence of random or suboptimal agents, and skillfully exploits their behavioral information. In addition to societies including expert agents, in the presence of relevant but non-expert agents, our algorithm significantly enhances individual learning performance, where most related methods fail. Importantly, it also maintains logarithmic regret.
Abstract（参考訳）: パーソナライズされたAIベースのサービスは、個々の強化学習エージェントの集団を含んでいる。しかし、ほとんどの強化学習アルゴリズムは、個々の学習を活用することに集中しており、人間や動物が一般的に提示する社会的学習能力の活用に失敗している。社会的学習は個人の経験と他人の行動を観察し、学習成果を改善する機会を提示する。本研究では,社会的エージェントが報酬を知らずに他のエージェントの行動を観察する,社会的盗賊学習のシナリオに焦点を当てた。エージェントは、互いに教える明確なモチベーションなしで、独立して独自のポリシーを追求する。本稿では,社会規範に頼らずに,社会エージェントが他人の専門知識を評価できる,自由エネルギーに基づくソーシャル・バンディット学習アルゴリズムを提案する。そのため,社会的エージェントは,環境および他者の推定方針における直接的な経験を統合する。アルゴリズムの最適ポリシーへの理論的収束が証明された。経験的評価は、様々なシナリオにおいて、代替アプローチよりも社会学習法の方が優れていることを検証する。提案アルゴリズムは,ランダムエージェントや準最適エージェントの存在下においても,関連するエージェントを戦略的に識別し,その行動情報を巧みに活用する。専門的エージェントを含む社会に加え、関連するが専門的でないエージェントの存在下では、我々のアルゴリズムは、ほとんどのメソッドが失敗する個別の学習性能を著しく向上させる。重要な点として、対数的後悔も維持する。

論文の概要: Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

関連論文リスト