Fugu-MT 論文翻訳(概要): SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

論文の概要: SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

arxiv url: http://arxiv.org/abs/2605.13672v1
Date: Wed, 13 May 2026 15:32:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.134831
Title: SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification
Title（参考訳）: SpurAudio:Few-Shotオーディオ分類におけるショートカット学習のベンチマーク
Authors: Giries Abu Ayoub, Morad Tukan, Loay Mualem,
Abstract要約: FSC(Few-shot Classification)は限られたラベル付きデータから学習するために広く用いられているが、ほとんどの評価では、ターゲット概念は文脈的手がかりとは無関係であると暗黙的に仮定している。実世界の設定では、サンプルはリッチなコンテキストにしばしば現れ、モデルが前景のコンテンツと背景の信号の間の刺激的な相関を活用できる。 SpurAudioは、音声における前景イベントと背景環境の自然な分離性を利用して、サポートとクエリセット間のコンテキストシフトの制御されたマルチレベル評価を可能にするベンチマークである。
参考スコア（独自算出の注目度）: 4.791940743080381
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-shot classification (FSC) is widely used for learning from limited labeled data, yet most evaluations implicitly assume that target concepts are independent of contextual cues. In real-world settings, however, examples often appear within rich contexts, allowing models to exploit spurious correlations between foreground content and background signals. While such effects have been studied in few-shot image classification, their role in few-shot audio classification remains largely unexplored, and existing audio benchmarks offer limited control over contextual structure. We introduce SpurAudio, a benchmark that leverages the natural separability of foreground events and background environments in audio to enable controlled, multi-level evaluation of contextual shifts across support and query sets. Using this benchmark, we show that many state-of-the-art few-shot methods suffer severe performance degradation when background correlations are disrupted, despite achieving similar accuracy under standard evaluation protocols. Crucially, this vulnerability persists even in large pretrained audio foundation models, ruling out limited backbone capacity as an explanation. Moreover, methods that appear comparable under conventional benchmarks can exhibit markedly different sensitivity to spurious correlations, revealing systematic algorithmic strengths and vulnerabilities tied to how feature representations interact with classifier heads at inference time. These findings provide new insight into the behavior of few-shot methods in audio and highlight the need for benchmarks that explicitly probe context dependence when evaluating FSC models.
Abstract（参考訳）: FSC(Few-shot Classification)は限られたラベル付きデータから学習するために広く用いられているが、ほとんどの評価では、ターゲット概念は文脈的手がかりとは無関係であると暗黙的に仮定している。しかし、実世界の設定では、サンプルはリッチな文脈にしばしば現れ、モデルが前景のコンテンツと背景の信号の間の急激な相関を活用できる。このような効果は、少数ショット画像分類において研究されているが、少数ショット音声分類におけるそれらの役割はほとんど解明されていないままであり、既存の音声ベンチマークは文脈構造を限定的に制御している。 SpurAudioは、音声における前景イベントと背景環境の自然な分離性を利用して、サポートとクエリセット間のコンテキストシフトの制御されたマルチレベル評価を可能にするベンチマークである。このベンチマークを用いて、標準的な評価プロトコルでは類似の精度を達成できたが、背景相関が乱れれば、多くの最先端のショット手法が深刻な性能劣化を被ることを示した。重要なことに、この脆弱性は大規模な事前訓練されたオーディオ基礎モデルでも持続し、説明としてバックボーン容量の制限を除外する。さらに、従来のベンチマークに匹敵する手法では、スプリアス相関に対する感度が著しく異なり、推論時に特徴表現がクラシファイアヘッドとどのように相互作用するかに結びついている、体系的なアルゴリズムの強みと脆弱性が明らかになる。これらの結果から,FSCモデルを評価する際の文脈依存性を明示的に調査するベンチマークの必要性が明らかになった。

論文の概要: SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

関連論文リスト