Fugu-MT 論文翻訳(概要): RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

論文の概要: RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

arxiv url: http://arxiv.org/abs/2604.00507v1
Date: Wed, 01 Apr 2026 05:47:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.852167
Title: RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection
Title（参考訳）: RegFormer: 効率のよい人間-物体間相互作用検出のための転送可能なリレーショナルグラウンド
Authors: Jihwan Park, Chanhyeong Yang, Jinyoung Park, Taehoon Song, Hyunwoo J. Kim,
Abstract要約: シーン理解には,弱教師付きHuman-Object Interaction (HOI) 検出が不可欠である。 RegFormerはインスタンスレベルのHOI推論のための汎用的なインタラクション認識モジュールである。実験と分析により,RegFormerは実例レベルの相互作用推論のための空間的手がかりを効果的に学習することを示した。
参考スコア（独自算出の注目度）: 38.362111975504696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weakly-supervised Human-Object Interaction (HOI) detection is essential for scalable scene understanding, as it learns interactions from only image-level annotations. Due to the lack of localization signals, prior works typically rely on an external object detector to generate candidate pairs and then infer their interactions through pairwise reasoning. However, this framework often struggles to scale due to the substantial computational cost incurred by enumerating numerous instance pairs. In addition, it suffers from false positives arising from non-interactive combinations, which hinder accurate instance-level HOI reasoning. To address these issues, we introduce Relational Grounding Transformer (RegFormer), a versatile interaction recognition module for efficient and accurate HOI reasoning. Under image-level supervision, RegFormer leverages spatially grounded signals as guidance for the reasoning process and promotes locality-aware interaction learning. By learning localized interaction cues, our module distinguishes humans, objects, and their interactions, enabling direct transfer from image-level interaction reasoning to precise and efficient instance-level reasoning without additional training. Our extensive experiments and analyses demonstrate that RegFormer effectively learns spatial cues for instance-level interaction reasoning, operates with high efficiency, and even achieves performance comparable to fully supervised models. Our code is available at https://github.com/mlvlab/RegFormer.
Abstract（参考訳）: 画像レベルのアノテーションのみからインタラクションを学習するので、拡張性のあるシーン理解には、弱い教師付きHuman-Object Interaction (HOI)検出が不可欠である。ローカライゼーション信号が欠如しているため、以前の研究は通常、外部の物体検出器を使って候補ペアを生成し、その相互作用をペアワイズ推論によって推測する。しかしながら、このフレームワークは、多数のインスタンスペアを列挙することによって生じる計算コストが大幅に削減されるため、スケールに苦慮することが多い。さらに、非相互作用的な組み合わせによって生じる偽陽性に悩まされ、正確なインスタンスレベルのHOI推論を妨げます。このような問題に対処するために,我々は,効率よく正確なHOI推論を行う汎用的な相互作用認識モジュールであるRelational Grounding Transformer (RegFormer)を紹介した。画像レベルの監視の下では、RegFormerは推論プロセスのガイダンスとして空間的に接地された信号を活用し、局所性を考慮した対話学習を促進する。ローカライズされたインタラクションの手がかりを学習することで、我々のモジュールは人間、オブジェクト、そしてそれらのインタラクションを区別し、画像レベルのインタラクション推論から、追加のトレーニングなしで正確で効率的なインスタンスレベルの推論へ直接移行することができる。本稿では,RegFormerがインスタンスレベルの相互作用推論の空間的手がかりを効果的に学習し,高い効率で動作し,完全教師付きモデルに匹敵する性能を達成できることを示す。私たちのコードはhttps://github.com/mlvlab/RegFormer.comから入手可能です。

論文の概要: RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

関連論文リスト