Fine-grained Affordance Annotation for Egocentric Hand-Object
Interaction Videos
- URL: http://arxiv.org/abs/2302.03292v1
- Date: Tue, 7 Feb 2023 07:05:00 GMT
- Title: Fine-grained Affordance Annotation for Egocentric Hand-Object
Interaction Videos
- Authors: Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu,
Yoichi Sato
- Abstract summary: Object affordance provides information on action possibilities based on human motor capacity and objects' physical property.
This paper proposes an efficient annotation scheme to address these issues by combining goal-irrelevant motor actions and grasp types as affordance labels.
We provide new annotations by applying this scheme to the EPIC-KITCHENS dataset and test our annotation with tasks such as affordance recognition, hand-object interaction hotspots prediction, and cross-domain evaluation of affordance.
- Score: 27.90643693526274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object affordance is an important concept in hand-object interaction,
providing information on action possibilities based on human motor capacity and
objects' physical property thus benefiting tasks such as action anticipation
and robot imitation learning. However, the definition of affordance in existing
datasets often: 1) mix up affordance with object functionality; 2) confuse
affordance with goal-related action; and 3) ignore human motor capacity. This
paper proposes an efficient annotation scheme to address these issues by
combining goal-irrelevant motor actions and grasp types as affordance labels
and introducing the concept of mechanical action to represent the action
possibilities between two objects. We provide new annotations by applying this
scheme to the EPIC-KITCHENS dataset and test our annotation with tasks such as
affordance recognition, hand-object interaction hotspots prediction, and
cross-domain evaluation of affordance. The results show that models trained
with our annotation can distinguish affordance from other concepts, predict
fine-grained interaction possibilities on objects, and generalize through
different domains.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Cross-Embodied Affordance Transfer through Learning Affordance Equivalences [6.828097734917722]
We propose a deep neural network model that unifies objects, actions, and effects into a single latent vector in a common latent space that we call the affordance space.
Our model does not learn the behavior of individual objects acted upon by a single agent.
Affordance Equivalence facilitates not only action generalization over objects but also Cross Embodiment transfer linking actions of different robots.
arXiv Detail & Related papers (2024-04-24T05:07:36Z) - Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - Controllable Human-Object Interaction Synthesis [77.56877961681462]
We propose Controllable Human-Object Interaction Synthesis (CHOIS) to generate synchronized object motion and human motion in 3D scenes.
Here, language descriptions inform style and intent, and waypoints, which can be effectively extracted from high-level planning, ground the motion in the scene.
Our module seamlessly integrates with a path planning module, enabling the generation of long-term interactions in 3D environments.
arXiv Detail & Related papers (2023-12-06T21:14:20Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Leveraging Next-Active Objects for Context-Aware Anticipation in
Egocentric Videos [31.620555223890626]
We study the problem of Short-Term Object interaction anticipation (STA)
We propose NAOGAT, a multi-modal end-to-end transformer network, to guide the model to predict context-aware future actions.
Our model outperforms existing methods on two separate datasets.
arXiv Detail & Related papers (2023-08-16T12:07:02Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Precise Affordance Annotation for Egocentric Action Video Datasets [27.90643693526274]
Object affordance is an important concept in human-object interaction.
Existing datasets often mix up affordance with object functionality.
We introduce the concept of mechanical action to represent the action possibilities between two objects.
arXiv Detail & Related papers (2022-06-11T05:13:19Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.