Egocentric Human-Object Interaction Detection: A New Benchmark and Method
- URL: http://arxiv.org/abs/2506.14189v1
- Date: Tue, 17 Jun 2025 05:03:42 GMT
- Title: Egocentric Human-Object Interaction Detection: A New Benchmark and Method
- Authors: Kunyuan Deng, Yi Wang, Lap-Pui Chau,
- Abstract summary: Ego-HOIBench is a new dataset to promote the benchmarking and development of Ego-HOI detection.<n>Our approach is lightweight and effective, and it can be easily applied to HOI baselines in a plug-and-play manner.
- Score: 14.765419467710812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the interaction between humans and objects has gained much attention in recent years. Existing human-object interaction (HOI) detection methods mainly focus on the third-person perspectives, overlooking a more intuitive way from the egocentric view of HOI, namely Ego-HOI. This paper introduces an Ego-HOIBench, a new dataset to promote the benchmarking and development of Ego-HOI detection. Our Ego-HOIBench comprises more than 27K egocentric images with high-quality hand-verb-object triplet annotations across 123 fine-grained interaction categories and locations, covering a rich diversity of scenarios, object types, and hand configurations in daily activities. In addition, we explore and adapt third-person HOI detection methods to Ego-HOIBench and illustrate the challenges of hand-occluded objects and the complexity of single- and two-hand interactions. To build a new baseline, we propose a Hand Geometry and Interactivity Refinement (HGIR) scheme, which leverages hand pose and geometric information as valuable cues for interpreting interactions. Specifically, the HGIR scheme explicitly extracts global hand geometric features from the estimated hand pose proposals and refines the interaction-specific features using pose-interaction attention. This scheme enables the model to obtain a robust and powerful interaction representation, significantly improving the Ego-HOI detection capability. Our approach is lightweight and effective, and it can be easily applied to HOI baselines in a plug-and-play manner to achieve state-of-the-art results on Ego-HOIBench. Our project is available at: https://dengkunyuan.github.io/EgoHOIBench/
Related papers
- ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction [16.338872733140832]
This paper presents a novel task named Egocentric Interaction Reasoning and pixel Grounding (Ego-IRG)<n>Taking an egocentric image with the query as input, Ego-IRG is the first task that aims to resolve the interactions through three crucial steps: analyzing, answering, and pixel grounding.<n>The Ego-IRGBench dataset includes over 20k egocentric images with 1.6 million queries and corresponding multimodal responses about interactions.
arXiv Detail & Related papers (2025-04-02T08:24:35Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Grounding 3D Scene Affordance From Egocentric Interactions [52.5827242925951]
Grounding 3D scene affordance aims to locate interactive regions in 3D environments.
We introduce a novel task: grounding 3D scene affordance from egocentric interactions.
arXiv Detail & Related papers (2024-09-29T10:46:19Z) - CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation [14.765419467710812]
Egocentric Interactive hand-object segmentation (EgoIHOS) is crucial for understanding human behavior in assistive systems.<n>Previous methods recognize hands and interacting objects as distinct semantic categories based solely on visual features.<n>We propose CaRe-Ego, which emphasizes the contact between hands and objects from two aspects.
arXiv Detail & Related papers (2024-07-08T03:17:10Z) - EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception.
Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view.
We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z) - Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects [89.95728475983263]
holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
We design the HANDS23 challenge based on the AssemblyHands and ARCTIC datasets with carefully designed training and testing splits.
Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
arXiv Detail & Related papers (2024-03-25T05:12:21Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - EgoPCA: A New Framework for Egocentric Hand-Object Interaction
Understanding [99.904140768186]
This paper proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA)
We contribute comprehensive pre-train sets, balanced test sets and a new baseline, which are complete with a training-finetuning strategy.
We believe our data and the findings will pave a new way for Ego-HOI understanding.
arXiv Detail & Related papers (2023-09-05T17:51:16Z) - Geometric Features Informed Multi-person Human-object Interaction
Recognition in Videos [19.64072251418535]
We argue to combine the benefits of both visual and geometric features in HOI recognition.
We propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN)
To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72)
arXiv Detail & Related papers (2022-07-19T17:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.