Fugu-MT 論文翻訳(概要): GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction

論文の概要: GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction

arxiv url: http://arxiv.org/abs/2510.25301v1
Date: Wed, 29 Oct 2025 09:14:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 15:50:45.294415
Title: GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction
Title（参考訳）: GaTector+: 予測後にガゼオブジェクトとガゼオブジェクトを統合したヘッドフリーフレームワーク
Authors: Yang Jin, Guangyu Guo, Binglu Wang,
Abstract要約: GaTector+は、視線オブジェクトの検出と視線追跡のための統合されたフレームワークである。まず、各人物の頭部を予測するために、頭部検出枝を埋め込む。そして、視線を後退させる前に、頭部位置の助けを借りて視線特徴と視線特徴を融合させるヘッドベース注意機構を提案する。
参考スコア（独自算出の注目度）: 25.92263916002385
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaze object detection and gaze following are fundamental tasks for interpreting human gaze behavior or intent. However, most previous methods usually solve these two tasks separately, and their prediction of gaze objects and gaze following typically depend on head-related prior knowledge during both the training phase and real-world deployment. This dependency necessitates an auxiliary network to extract head location, thus precluding joint optimization across the entire system and constraining the practical applicability. To this end, we propose GaTector+, a unified framework for gaze object detection and gaze following, which eliminates the dependence on the head-related priors during inference. Specifically, GaTector+ uses an expanded specific-general-specific feature extractor that leverages a shared backbone, which extracts general features for gaze following and object detection using the shared backbone while using specific blocks before and after the shared backbone to better consider the specificity of each sub-task. To obtain head-related knowledge without prior information, we first embed a head detection branch to predict the head of each person. Then, before regressing the gaze point, a head-based attention mechanism is proposed to fuse the sense feature and gaze feature with the help of head location. Since the suboptimization of the gaze point heatmap leads to the performance bottleneck, we propose an attention supervision mechanism to accelerate the learning of the gaze heatmap. Finally, we propose a novel evaluation metric, mean Similarity over Candidates (mSoC), for gaze object detection, which is more sensitive to variations between bounding boxes. The experimental results on multiple benchmark datasets demonstrate the effectiveness of our model in both gaze object detection and gaze following tasks.
Abstract（参考訳）: 視線検出と視線追跡は、人間の視線行動や意図を解釈するための基本的なタスクである。しかし,従来の手法ではこれら2つの課題を別々に解決し,視線オブジェクトと視線追跡の予測は,訓練段階と実世界の展開の双方において,頭に関連した事前知識に依存するのが一般的である。この依存関係は、頭部位置を抽出する補助ネットワークを必要とするため、システム全体の共同最適化を先取りし、実用性を制限する。そこで本研究では,物体検出と視線追従の統一フレームワークであるGaTector+を提案する。具体的には、GaTector+は、共有バックボーンを利用する拡張された特定の汎用的な特徴抽出器を使用し、共有バックボーンの前後で特定のブロックを使用して、共有バックボーンを使用した視線追跡とオブジェクト検出の一般的な特徴を抽出し、各サブタスクの特異性をよりよく検討する。先行情報のない頭部関連知識を得るため,まず頭部検出枝を組込み,各人物の頭部を推定する。そして、視線を後退させる前に、頭部位置の助けを借りて視線特徴と視線特徴を融合させるヘッドベース注意機構を提案する。注視点ヒートマップのサブ最適化は性能ボトルネックにつながるため,注視点ヒートマップの学習を高速化するための注意監視機構を提案する。最後に,境界ボックス間の変動に敏感な視線オブジェクト検出のための新しい評価基準,平均候補に対する類似度(mSoC)を提案する。複数のベンチマークデータセットに対する実験結果は、視線オブジェクトの検出と視線追従タスクの両方において、我々のモデルの有効性を示す。

論文の概要: GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction

関連論文リスト