Learning Affordance Grounding from Exocentric Images
- URL: http://arxiv.org/abs/2203.09905v1
- Date: Fri, 18 Mar 2022 12:29:06 GMT
- Title: Learning Affordance Grounding from Exocentric Images
- Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao
- Abstract summary: Affordance grounding is a task to ground (i.e., localize) action possibility region in objects.
Human has the ability that transform the various exocentric interactions to invariant egocentric affordance.
This paper proposes a task of affordance grounding from exocentric view, i.e. given exocentric human-object interaction and egocentric object images.
- Score: 79.64064711636975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Affordance grounding, a task to ground (i.e., localize) action possibility
region in objects, which faces the challenge of establishing an explicit link
with object parts due to the diversity of interactive affordance. Human has the
ability that transform the various exocentric interactions to invariant
egocentric affordance so as to counter the impact of interactive diversity. To
empower an agent with such ability, this paper proposes a task of affordance
grounding from exocentric view, i.e., given exocentric human-object interaction
and egocentric object images, learning the affordance knowledge of the object
and transferring it to the egocentric image using only the affordance label as
supervision. To this end, we devise a cross-view knowledge transfer framework
that extracts affordance-specific features from exocentric interactions and
enhances the perception of affordance regions by preserving affordance
correlation. Specifically, an Affordance Invariance Mining module is devised to
extract specific clues by minimizing the intra-class differences originated
from interaction habits in exocentric images. Besides, an Affordance
Co-relation Preserving strategy is presented to perceive and localize
affordance by aligning the co-relation matrix of predicted results between the
two views. Particularly, an affordance grounding dataset named AGD20K is
constructed by collecting and labeling over 20K images from 36 affordance
categories. Experimental results demonstrate that our method outperforms the
representative models in terms of objective metrics and visual quality. Code:
github.com/lhc1224/Cross-View-AG.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding [10.787807888885888]
We propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA)
Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only.
Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD.
arXiv Detail & Related papers (2024-09-10T04:31:51Z) - HEAP: Unsupervised Object Discovery and Localization with Contrastive
Grouping [29.678756772610797]
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision.
Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features.
To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
arXiv Detail & Related papers (2023-12-29T06:46:37Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - Grounded Affordance from Exocentric View [79.64064711636975]
Affordance grounding aims to locate objects' "action possibilities" regions.
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions.
Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance.
arXiv Detail & Related papers (2022-08-28T10:32:47Z) - Kinship Verification Based on Cross-Generation Feature Interaction
Learning [53.62256887837659]
Kinship verification from facial images has been recognized as an emerging yet challenging technique in computer vision applications.
We propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.
arXiv Detail & Related papers (2021-09-07T01:50:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.