Fugu-MT 論文翻訳(概要): GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

論文の概要: GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

arxiv url: http://arxiv.org/abs/2603.27014v1
Date: Fri, 27 Mar 2026 22:08:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.736292
Title: GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection
Title（参考訳）: GUIDED:細粒度開語彙オブジェクト検出のための識別・検出・識別による粒界理解
Authors: Jiaming Li, Zhijia Liang, Weikai Chen, Lin Ma, Guanbin Li,
Abstract要約: 細粒度オープン語彙オブジェクト検出(FG-OVD)は属性リッチテキストで記述された新しいオブジェクトカテゴリを検出することを目的としている。 FG-OVDは、非絡み合いモデリングとモジュラー最適化の利点を実証し、新しい最先端の結果を達成する。
参考スコア（独自算出の注目度）: 54.19989440021701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-grained open-vocabulary object detection (FG-OVD) aims to detect novel object categories described by attribute-rich texts. While existing open-vocabulary detectors show promise at the base-category level, they underperform in fine-grained settings due to the semantic entanglement of subjects and attributes in pretrained vision-language model (VLM) embeddings -- leading to over-representation of attributes, mislocalization, and semantic drift in embedding space. We propose GUIDED, a decomposition framework specifically designed to address the semantic entanglement between subjects and attributes in fine-grained prompts. By separating object localization and fine-grained recognition into distinct pathways, HUIDED aligns each subtask with the module best suited for its respective roles. Specifically, given a fine-grained class name, we first use a language model to extract a coarse-grained subject and its descriptive attributes. Then the detector is guided solely by the subject embedding, ensuring stable localization unaffected by irrelevant or overrepresented attributes. To selectively retain helpful attributes, we introduce an attribute embedding fusion module that incorporates attribute information into detection queries in an attention-based manner. This mitigates over-representation while preserving discriminative power. Finally, a region-level attribute discrimination module compares each detected region against full fine-grained class names using a refined vision-language model with a projection head for improved alignment. Extensive experiments on FG-OVD and 3F-OVD benchmarks show that GUIDED achieves new state-of-the-art results, demonstrating the benefits of disentangled modeling and modular optimization. Our code will be released at https://github.com/lijm48/GUIDED.
Abstract（参考訳）: 細粒度オープン語彙オブジェクト検出(FG-OVD)は属性リッチテキストで記述された新しいオブジェクトカテゴリを検出することを目的としている。既存のオープンボキャブラリ検出器はベースカテゴリレベルでの約束を示す一方で、事前訓練された視覚言語モデル(VLM)埋め込みにおける対象と属性のセマンティックな絡み合いによるきめ細かい設定では、属性の過剰表現、非局在化、埋め込み空間でのセマンティックドリフトなど、パフォーマンスが低い。対象と属性間の意味的絡み合いを微粒なプロンプトで解決するための分解フレームワークGUIDEDを提案する。物体の局在化と微粒化認識を別々の経路に分離することにより、HUIDEDは各サブタスクをそれぞれの役割に適したモジュールと整列させる。具体的には、クラス名がきめ細かい場合、まず言語モデルを用いて、粗い被写体とその記述的属性を抽出する。そして、検出器を被写体埋め込みのみでガイドし、無関係または過剰表現の属性の影響を受けない安定した位置決めを確保する。有用な属性を選択的に保持するために,属性情報を注目方式で検出クエリに組み込む属性埋め込み融合モジュールを導入する。これは差別力を保ちながら過剰表現を緩和する。最後に、領域レベルの属性判別モジュールは、修正された視覚言語モデルと投影ヘッドを用いて検出された各領域を、完全な粒度クラス名と比較し、アライメントを改善した。 FG-OVDと3F-OVDベンチマークの大規模な実験は、GUIDEDが新しい最先端の結果を達成し、不整合モデリングとモジュラー最適化の利点を実証していることを示している。私たちのコードはhttps://github.com/lijm48/GUIDEDでリリースされます。

論文の概要: GUIDED: Granular Understanding via Identification, Detection, and Discrimination for Fine-Grained Open-Vocabulary Object Detection

関連論文リスト