Fugu-MT 論文翻訳(概要): Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

論文の概要: Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

arxiv url: http://arxiv.org/abs/2605.02752v2
Date: Wed, 13 May 2026 14:26:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 17:13:58.751909
Title: Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
Title（参考訳）: 実際に数えるのか? テキスト誘導型クラス非依存数における意味的接地の評価
Authors: Giacomo Pacini, Luca Ciampi, Nicola Messina, Nicola Tonellotto, Giuseppe Amato, Fabrizio Falchi,
Abstract要約: オープンワールドテキスト誘導クラス非依存カウント(CAC)は、自然言語プロンプトを用いて任意のオブジェクトクラスをカウントするためのフレキシブルパラダイムとして登場した。いくつかの最先端のCACモデルは、与えられたプロンプトに基づいてどのオブジェクトクラスをカウントすべきかを決定するのに苦労している。モデル堅牢性と信頼性に着目した新しい評価フレームワークを提案する。
参考スコア（独自算出の注目度）: 17.927293384172003
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-world text-guided class-agnostic counting (CAC) has emerged as a flexible paradigm for counting arbitrary object classes by using natural language prompts. However, current evaluation protocols primarily focus on standard counting errors within single-category images, overlooking a fundamental requirement: the ability to correctly ground the textual prompt in the visual scene. In this paper, we show that several state-of-the-art CAC models often struggle to determine which object class should be counted based on the given prompt, revealing a misalignment between textual semantics and visual object representations. This limitation leads to spurious counting responses and reduced reliability in real-world scenarios. To systematically address these limitations, we propose a new evaluation framework focused on model robustness and trustworthiness. Our contribution is two-fold: (i) we introduce PrACo++ (Prompt-Aware Counting++), a novel test suite featuring two dedicated evaluation protocols -- the negative-label test and the distractor test -- paired with new specialized metrics; and (ii) we present the MUCCA (MUlti-Category Class-Agnostic counting) evaluation dataset, a new collection of real-world images featuring multiple annotated object categories per scene, unlike existing CAC benchmarks that typically include a single category per image. Our extensive experimental evaluation of 10 state-of-the-art methods shows that, despite strong performance under standard counting metrics, current models exhibit significant weaknesses in understanding and grounding object class descriptions. Finally, we provide a quantitative analysis of how semantic similarity between prompts influences these failures. Overall, our results underscore the need for more semantically grounded architectures and offer a reliable framework for future assessment in open-world text-guided CAC methods.
Abstract（参考訳）: オープンワールドテキスト誘導クラス非依存カウント(CAC)は、自然言語プロンプトを用いて任意のオブジェクトクラスをカウントするためのフレキシブルパラダイムとして登場した。しかしながら、現在の評価プロトコルは主に単一のカテゴリ内の標準的なカウントエラーに注目しており、視覚的なシーンでテキストプロンプトを正しくグラウンドする能力という、基本的な要件を見落としている。本稿では,現在最先端のCACモデルにおいて,与えられたプロンプトに基づいてどのオブジェクトクラスをカウントすべきかを決定するのに苦慮し,テキスト意味論と視覚オブジェクト表現の相違を明らかにする。この制限は、現実世界のシナリオにおいて、急激なカウント応答と信頼性の低下につながる。これらの制約を体系的に解決するために,モデルの堅牢性と信頼性に着目した新しい評価フレームワークを提案する。私たちの貢献は2つあります。 i) PrACo++(Prompt-Aware Counting++)という,2つの専用評価プロトコル – 負ラベルテストとイントラクタテスト – を備えた,新たな特殊なメトリクスと組み合わせたテストスイートを紹介します。 (II) MUCCA (MUlti-Category Class-Agnostic counting) 評価データセット(Multi-Category Class-Agnostic counting)を提案する。 10種類の最先端の手法を実験的に評価したところ、標準的な計数基準下での強い性能にもかかわらず、現在のモデルでは、オブジェクトのクラス記述の理解とグラウンドニングにおいて重大な弱点が示される。最後に、プロンプト間の意味的類似性がこれらの失敗にどのように影響するかを定量的に分析する。全体として、我々は、よりセマンティックな基盤を持つアーキテクチャの必要性を強調し、オープンワールドのテキスト誘導型CAC手法における将来の評価のための信頼性の高いフレームワークを提供する。

論文の概要: Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

関連論文リスト