Fugu-MT 論文翻訳(概要): PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

論文の概要: PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

arxiv url: http://arxiv.org/abs/2601.03531v1
Date: Wed, 07 Jan 2026 02:44:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 02:15:23.175617
Title: PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
Title（参考訳）: PALM-Bench:パーソナライズドオーディオ言語モデルのための総合ベンチマーク
Authors: Yuwen Wang, Xinyuan Qian, Tian-Hao Zhang, Jiaran Gao, Yuchen Pan, Xin Wang, Zhou Pan, Chen Wei, Yiming Wang,
Abstract要約: LALM(Large Audio-Language Models)は、音声の理解と生成において強力な性能を示す。パーソナライズされたLALM(Personalized LALMs)のタスクを,個人的コンテキスト内での個人的概念と推論の認識のために形式化する。代表的なオープンソースLALMに関する実験では、既存のトレーニング不要のプロンプトと教師付き微調整戦略が改善しながらも、パーソナライズされた知識のモデリングに限られていることが示されている。
参考スコア（独自算出の注目度）: 21.26163892337167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Audio-Language Models (LALMs) have demonstrated strong performance in audio understanding and generation. Yet, our extensive benchmarking reveals that their behavior is largely generic (e.g., summarizing spoken content) and fails to adequately support personalized question answering (e.g., summarizing what my best friend says). In contrast, human conditions their interpretation and decision-making on each individual's personal context. To bridge this gap, we formalize the task of Personalized LALMs (PALM) for recognizing personal concepts and reasoning within personal context. Moreover, we create the first benchmark (PALM-Bench) to foster the methodological advances in PALM and enable structured evaluation on several tasks across multi-speaker scenarios. Our extensive experiments on representative open-source LALMs, show that existing training-free prompting and supervised fine-tuning strategies, while yield improvements, remains limited in modeling personalized knowledge and transferring them across tasks robustly. Data and code will be released.
Abstract（参考訳）: LALM(Large Audio-Language Models)は、音声の理解と生成において強力な性能を示す。しかし、我々の広範なベンチマークでは、彼らの行動は概ね汎用的(音声コンテンツの要約など)であり、パーソナライズされた質問応答(例えば、私の親友の言うことを要約する)を適切にサポートできないことが明らかになっている。対照的に、人間は個々の個人の状況に対する解釈と意思決定を条件にしている。パーソナライズされたLALM(Personalized LALMs, パーソナライズされたLALM)のタスクを, このギャップを埋めるために, 個人的概念と個人的文脈内での推論を形式化する。さらに,PALMの方法論的進歩を育むための最初のベンチマーク(PALM-Bench)を作成し,マルチスピーカシナリオにおける複数のタスクの構造化評価を可能にする。代表的なオープンソースLALMに関する広範な実験により、既存のトレーニングフリーのプロンプトと教師付き微調整戦略は改善されているものの、パーソナライズされた知識をモデリングし、タスク間で堅牢に伝達することに制限されていることが示された。データとコードはリリースされる。

論文の概要: PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models

関連論文リスト