Fugu-MT 論文翻訳(概要): A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

論文の概要: A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

arxiv url: http://arxiv.org/abs/2604.13448v1
Date: Wed, 15 Apr 2026 04:01:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.371252
Title: A Study of Failure Modes in Two-Stage Human-Object Interaction Detection
Title（参考訳）: 二段階人間-物体相互作用検出における故障モードの検討
Authors: Lemeng Wang, Qinqian Lei, Vidhi Bakshi, Daniel Yi, Yifan Liu, Jiacheng Hou, Asher Seng Hao, Zheda Mai, Wei-Lun Chao, Robby T. Tan, Bo Wang,
Abstract要約: 本稿では,2段階HOIモデルの故障モードをよりよく理解するための研究を行う。 HOI検出を複数の解釈可能な視点に分解して、さまざまなタイプの障害パターンを研究する。
参考スコア（独自算出の注目度）: 49.37675694881915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-object interaction (HOI) detection aims to detect interactions between humans and objects in images. While recent advances have improved performance on existing benchmarks, their evaluations mainly focus on overall prediction accuracy and provide limited insight into the underlying causes of model failures. In particular, modern models often struggle in complex scenes involving multiple people and rare interaction combinations. In this work, we present a study to better understand the failure modes of two-stage HOI models, which form the basis of many current HOI detection approaches. Rather than constructing a large-scale benchmark, we instead decompose HOI detection into multiple interpretable perspectives and analyze model behavior across these dimensions to study different types of failure patterns. We curate a subset of images from an existing HOI dataset organized by human-object-interaction configurations (e.g., multi-person interactions and object sharing), and analyze model behavior under these configurations to examine different failure modes. This design allows us to analyze how these HOI models behave under different scene compositions and why their predictions fail. Importantly, high overall benchmark performance does not necessarily reflect robust visual reasoning about human-object relationships. We hope that this study can provide useful insights into the limitations of HOI models and offer observations for future research in this area.
Abstract（参考訳）: 人間オブジェクト間相互作用(Human-object Interaction,HOI)は、画像中の人間と物体間の相互作用を検出することを目的としている。最近の進歩では、既存のベンチマークのパフォーマンスが向上しているが、その評価は主に全体的な予測精度に焦点を当て、モデル障害の根本原因に関する限られた洞察を提供する。特に、現代のモデルは、複数の人と稀な相互作用の組み合わせを含む複雑な場面でしばしば苦労する。本研究では,現在のHOI検出手法の基礎となる2段階HOIモデルの故障モードをよりよく理解するための研究を行う。大規模なベンチマークを構築する代わりに、HOI検出を複数の解釈可能な視点に分解し、これらの次元にわたってモデル挙動を分析し、異なるタイプの障害パターンを研究する。人-オブジェクト-インタラクション構成(例えば、多人数インタラクションやオブジェクト共有)によって構成された既存のHOIデータセットからの画像のサブセットをキュレートし、これらの構成下でのモデル動作を分析して、異なる障害モードを調べる。この設計により、これらのHOIモデルが異なるシーン構成下でどのように振る舞うか、なぜ予測が失敗するのかを分析することができる。重要なのは、高いベンチマークパフォーマンスは、必ずしも人間とオブジェクトの関係に関する堅牢な視覚的推論を反映していないことだ。本研究は,HOIモデルの限界について有用な知見を提供し,今後の研究への展望を期待する。

論文の概要: A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

関連論文リスト