Fugu-MT 論文翻訳(概要): PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

論文の概要: PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

arxiv url: http://arxiv.org/abs/2603.14456v1
Date: Sun, 15 Mar 2026 16:06:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.81703
Title: PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
Title（参考訳）: PARSA-Bench:ペルシアの総合的なオーディオ言語モデルベンチマーク
Authors: Mohammad Javad Ranjbar Kalahroodi, Mohammad Amini, Parmis Bathayan, Heshaam Faili, Azadeh Shakery,
Abstract要約: PARSA-Benchはペルシア語と文化に関する大規模なオーディオ言語モデルを評価するための最初のベンチマークである。 16のタスクと8000以上のサンプルで構成されており、音声理解、パラ言語分析、文化的な音声理解にまたがっている。詩のメーターやスタイル検出、ペルシア音楽の伝統的な理解、コードスイッチング検出など、新たに10のタスクが導入されている。
参考スコア（独自算出の注目度）: 4.352747055546777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for evaluating large audio-language models on Persian language and culture, comprising 16 tasks and over 8,000 samples across speech understanding, paralinguistic analysis, and cultural audio understanding. Ten tasks are newly introduced, including poetry meter and style detection, traditional Persian music understanding, and code-switching detection. Text-only baselines consistently outperform audio counterparts, suggesting models may not leverage audio-specific information beyond what transcription alone provides. Culturally-grounded tasks expose a qualitatively distinct failure mode: all models perform near random chance on vazn detection regardless of scale, suggesting prosodic perception remains beyond the reach of current models. The dataset is publicly available at https://huggingface.co/datasets/MohammadJRanjbar/PARSA-Bench
Abstract（参考訳）: ペルシア語は、古典詩、伝統音楽、広範にわたるコードスイッチングを通じて独自の音声理解の課題を提起している。我々は,ペルシャ語と文化に関する大規模音声モデルを評価する最初のベンチマークであるPARSA-Bench(Persian Audio Reasoning and Speech Assessment Benchmark)を紹介した。詩のメーターやスタイル検出、ペルシア音楽の伝統的な理解、コードスイッチング検出など、新たに10のタスクが導入されている。テキストのみのベースラインは一貫してオーディオよりも優れており、モデルが書き起こしのみが提供するもの以上のオーディオ固有の情報を活用できない可能性があることを示唆している。すべてのモデルは、スケールに関係なく、ヴァズン検出においてほぼランダムな確率で実行され、現在のモデルの到達範囲を超えて韻律的知覚が残っていることを示唆する。データセットはhttps://huggingface.co/datasets/MohammadJRanjbar/PARSA-Benchで公開されている。

論文の概要: PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark

関連論文リスト