Fugu-MT 論文翻訳(概要): Heuristic-inspired Reasoning Priors Facilitate Data-Efficient Referring Object Detection

論文の概要: Heuristic-inspired Reasoning Priors Facilitate Data-Efficient Referring Object Detection

arxiv url: http://arxiv.org/abs/2603.24166v1
Date: Wed, 25 Mar 2026 10:33:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.248464
Title: Heuristic-inspired Reasoning Priors Facilitate Data-Efficient Referring Object Detection
Title（参考訳）: ヒューリスティックにインスパイアされた推論は、データ効率の良い参照オブジェクト検出を実現する
Authors: Xu Zhang, Zhe Chen, Jing Zhang, Dacheng Tao,
Abstract要約: HeROD(Heuristic-inspired ROD)は、明示的で解釈可能な空間的および意味的推論を注入する軽量でモデルに依存しないフレームワークである。 HeRODは、スカーセラベル体制において強い接地ベースラインを一貫して上回っている。
参考スコア（独自算出の注目度）: 53.988759250627425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most referring object detection (ROD) models, especially the modern grounding detectors, are designed for data-rich conditions, yet many practical deployments, such as robotics, augmented reality, and other specialized domains, would face severe label scarcity. In such regimes, end-to-end grounding detectors need to learn spatial and semantic structure from scratch, wasting precious samples. We ask a simple question: Can explicit reasoning priors help models learn more efficiently when data is scarce? To explore this, we first introduce a Data-efficient Referring Object Detection (De-ROD) task, which is a benchmark protocol for measuring ROD performance in low-data and few-shot settings. We then propose the HeROD (Heuristic-inspired ROD), a lightweight, model-agnostic framework that injects explicit, heuristic-inspired spatial and semantic reasoning priors, which are interpretable signals derived based on the referring phrase, into 3 stages of a modern DETR-style pipeline: proposal ranking, prediction fusion, and Hungarian matching. By biasing both training and inference toward plausible candidates, these priors promise to improve label efficiency and convergence performance. On RefCOCO, RefCOCO+, and RefCOCOg, HeROD consistently outperforms strong grounding baselines in scarce-label regimes. More broadly, our results suggest that integrating simple, interpretable reasoning priors provides a practical and extensible path toward better data-efficient vision-language understanding.
Abstract（参考訳）: ほとんどの参照対象検出(ROD)モデル、特に現代の接地検出器は、データ豊富な条件のために設計されているが、ロボティクス、拡張現実、その他の専門分野のような多くの実践的な展開は、深刻なラベルの不足に直面している。このような状況下では、端から端までの接地検出器は空間構造と意味構造をゼロから学習し、貴重なサンプルを無駄にする。明確な推論は、データが不足しているときにモデルをより効率的に学習するのに役立ちますか? そこで我々はまず,ローデータおよび少数ショット設定におけるRDD性能を測定するためのベンチマークプロトコルである,データ効率のよい参照オブジェクト検出(De-ROD)タスクを導入する。次に,HROD(Heuristic-inspired ROD)を提案する。これは,参照句に基づく解釈可能な信号である明示的,ヒューリスティックな空間的および意味的推論を,提案ランキング,予測融合,ハンガリー語マッチングの3段階に注入する,軽量でモデルに依存しないフレームワークである。評価可能な候補に対するトレーニングと推論の両方をバイアスすることにより、これらの事前はラベル効率と収束性能を改善することを約束する。 RefCOCO、RefCOCO+、およびRefCOCOgにおいて、HeRODは、希少なレーベル体制において、強い基盤線を一貫して上回っている。より広範に、我々の結果は、シンプルで解釈可能な推論の事前を統合することは、データ効率のよい視覚言語理解への実践的で拡張可能な道筋をもたらすことを示唆している。

論文の概要: Heuristic-inspired Reasoning Priors Facilitate Data-Efficient Referring Object Detection

関連論文リスト