INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
- URL: http://arxiv.org/abs/2409.06210v1
- Date: Tue, 10 Sep 2024 04:31:51 GMT
- Title: INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
- Authors: Ji Ha Jang, Hoigi Seo, Se Young Chun,
- Abstract summary: We propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA)
Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only.
Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD.
- Score: 10.787807888885888
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Affordance denotes the potential interactions inherent in objects. The perception of affordance can enable intelligent agents to navigate and interact with new environments efficiently. Weakly supervised affordance grounding teaches agents the concept of affordance without costly pixel-level annotations, but with exocentric images. Although recent advances in weakly supervised affordance grounding yielded promising results, there remain challenges including the requirement for paired exocentric and egocentric image dataset, and the complexity in grounding diverse affordances for a single object. To address them, we propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA). Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only, eliminating the need for paired datasets. Moreover, we leverage vision-language model embeddings for performing affordance grounding flexibly with any text, designing text-conditioned affordance map generation to reflect interaction relationship for contrastive learning and enhancing robustness with our text synonym augmentation. Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD. Additionally, experimental results demonstrate that our method has remarkable domain scalability for synthesized images / illustrations and is capable of performing affordance grounding for novel interactions and objects.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - Re-mine, Learn and Reason: Exploring the Cross-modal Semantic
Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task.
We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z) - Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings [6.371828910727037]
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks.
We address the problem of affordance categorization for class-agnostic objects with an open set of interactions.
A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs.
arXiv Detail & Related papers (2023-03-30T15:04:04Z) - Grounded Affordance from Exocentric View [79.64064711636975]
Affordance grounding aims to locate objects' "action possibilities" regions.
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions.
Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance.
arXiv Detail & Related papers (2022-08-28T10:32:47Z) - Learning Affordance Grounding from Exocentric Images [79.64064711636975]
Affordance grounding is a task to ground (i.e., localize) action possibility region in objects.
Human has the ability that transform the various exocentric interactions to invariant egocentric affordance.
This paper proposes a task of affordance grounding from exocentric view, i.e. given exocentric human-object interaction and egocentric object images.
arXiv Detail & Related papers (2022-03-18T12:29:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.