Fugu-MT 論文翻訳(概要): Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

論文の概要: Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

arxiv url: http://arxiv.org/abs/2606.17601v1
Date: Tue, 16 Jun 2026 07:08:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.322045
Title: Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting
Title（参考訳）: ロバストテキストガイドによるオープン語彙オブジェクトカウントの試験時間トレーニング
Authors: Hao-Yuan Ma, Yuda Zou, Li Zhang, Yongchao Xu,
Abstract要約: テキスト誘導Open-vocabulary Object Counting (TOOC)は、テキストプロンプトによって指定された任意のオブジェクトカテゴリをカウントできる。既存のTOOC手法は,主に理想画像に基づいて開発・評価されている。多様な汚職条件下でのTOOC評価のための最初のベンチマークであるRobust-TOOCを紹介する。本論文では,TOOCのためのデュアルアーキテクチャテストタイムトレーニングフレームワークであるDual-TTTを提案する。
参考スコア（独自算出の注目度）: 12.871212510225604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided Open-vocabulary Object Counting (TOOC) enables counting arbitrary object categories specified by text prompts, offering substantially greater flexibility than conventional closed-set counting. However, existing TOOC methods are developed and evaluated primarily on ideal images, while real-world scenes often suffer from adverse conditions such as rain, fog, darkness, and sensor noise, which severely degrade visual quality and impair vision-language alignment. To bridge this gap, we introduce Robust-TOOC, the first benchmark for evaluating TOOC under diverse corruption conditions, which covers six representative degradation types: rain, fog, darkness, Gaussian noise, salt-and-pepper noise, and mixed corruption. To improve robustness while preserving the original counting architecture, we propose Dual-TTT, a dual-architecture test-time training framework for TOOC. Specifically, during test-time training, Dual-TTT updates only the Text-guided Lightweight Denoising module (TL-Denoiser), while keeping the original counting network frozen. Inspired by diffusion models, the TL-Denoiser is optimized to remove corruption-aware noise from image representations under degraded conditions. Since only the TL-Denoiser is trained at test time, Dual-TTT is annotation-free and can be seamlessly integrated into existing TOOC models without modifying their original architecture. Extensive experiments on multiple recent TOOC baselines demonstrate the effectiveness of our method.
Abstract（参考訳）: テキスト誘導Open-vocabulary Object Counting (TOOC)は、テキストプロンプトによって指定された任意のオブジェクトカテゴリをカウントできる。しかし、既存のTOOC法は主に理想的な画像に基づいて開発され評価されているのに対し、現実世界のシーンは雨、霧、暗闇、センサーノイズなどの悪条件に悩まされ、視覚的品質や視覚言語によるアライメントが著しく低下する。このギャップを埋めるために, 雨, 霧, 暗闇, ガウスノイズ, 塩とペッパーノイズ, 混合汚濁の6種類の代表的な劣化タイプをカバーする, 多様な汚濁条件下でTOOCを評価するための最初のベンチマークであるRobust-TOOCを紹介した。元のカウントアーキテクチャを保存しながらロバスト性を向上させるため,TOOCのためのデュアルアーキテクチャテストタイムトレーニングフレームワークであるDual-TTTを提案する。具体的には、テスト時間トレーニング中にDual-TTTは、元のカウントネットワークを凍結させながら、Text-guided Lightweight Denoising Module (TL-Denoiser)のみを更新する。拡散モデルにインスパイアされたTL-Denoiserは、劣化した条件下での画像表現から汚損を考慮したノイズを取り除くように最適化されている。 TL-Denoiserのみがテスト時にトレーニングされるため、Dual-TTTはアノテーションなしで、元のアーキテクチャを変更することなく既存のTOOCモデルにシームレスに統合できる。複数のTOOCベースラインに対する大規模な実験により,本手法の有効性が示された。

論文の概要: Test-Time Training for Robust Text-Guided Open-Vocabulary Object Counting

関連論文リスト