InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild
- URL: http://arxiv.org/abs/2308.03061v2
- Date: Mon, 14 Aug 2023 12:44:57 GMT
- Title: InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild
- Authors: Yanyan Shao and Qi Ye and Wenhan Luo and Kaihao Zhang and Jiming Chen
- Abstract summary: Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
- Score: 40.489171608114574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding human interaction with objects is an important research topic
for embodied Artificial Intelligence and identifying the objects that humans
are interacting with is a primary problem for interaction understanding.
Existing methods rely on frame-based detectors to locate interacting objects.
However, this approach is subjected to heavy occlusions, background clutter,
and distracting objects. To address the limitations, in this paper, we propose
to leverage spatio-temporal information of hand-object interaction to track
interactive objects under these challenging cases. Without prior knowledge of
the general objects to be tracked like object tracking problems, we first
utilize the spatial relation between hands and objects to adaptively discover
the interacting objects from the scene. Second, the consistency and continuity
of the appearance of objects between successive frames are exploited to track
the objects. With this tracking formulation, our method also benefits from
training on large-scale general object-tracking datasets. We further curate a
video-level hand-object interaction dataset for testing and evaluation from
100DOH. The quantitative results demonstrate that our proposed method
outperforms the state-of-the-art methods. Specifically, in scenes with
continuous interaction with different objects, we achieve an impressive
improvement of about 10% as evaluated using the Average Precision (AP) metric.
Our qualitative findings also illustrate that our method can produce more
continuous trajectories for interacting objects.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings [6.371828910727037]
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks.
We address the problem of affordance categorization for class-agnostic objects with an open set of interactions.
A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs.
arXiv Detail & Related papers (2023-03-30T15:04:04Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - Discovering a Variety of Objects in Spatio-Temporal Human-Object
Interactions [45.92485321148352]
In daily HOIs, humans often interact with a variety of objects, e.g., holding and touching dozens of household items in cleaning.
Here, we introduce a new benchmark based on AVA: Discoveringed Objects (DIO) including 51 interactions and 1,000+ objects.
An ST-HOI learning task is proposed expecting vision systems to track human actors, detect interactions and simultaneously discover objects.
arXiv Detail & Related papers (2022-11-14T16:33:54Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.