Closed-Loop Transfer for Weakly-supervised Affordance Grounding
- URL: http://arxiv.org/abs/2510.17384v1
- Date: Mon, 20 Oct 2025 10:21:35 GMT
- Title: Closed-Loop Transfer for Weakly-supervised Affordance Grounding
- Authors: Jiajin Tang, Zhengxuan Wei, Ge Zheng, Sibei Yang,
- Abstract summary: LoopTrans is a novel closed-loop framework that transfers knowledge from exocentric to egocentric.<n>Within LoopTrans, several innovative mechanisms are introduced, including unified cross-modal localization and denoising knowledge distillation.<n>Experiments show that LoopTrans achieves consistent improvements across all metrics on image and video benchmarks.
- Score: 35.34120640245943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans can perform previously unexperienced interactions with novel objects simply by observing others engage with them. Weakly-supervised affordance grounding mimics this process by learning to locate object regions that enable actions on egocentric images, using exocentric interaction images with image-level annotations. However, extracting affordance knowledge solely from exocentric images and transferring it one-way to egocentric images limits the applicability of previous works in complex interaction scenarios. Instead, this study introduces LoopTrans, a novel closed-loop framework that not only transfers knowledge from exocentric to egocentric but also transfers back to enhance exocentric knowledge extraction. Within LoopTrans, several innovative mechanisms are introduced, including unified cross-modal localization and denoising knowledge distillation, to bridge domain gaps between object-centered egocentric and interaction-centered exocentric images while enhancing knowledge transfer. Experiments show that LoopTrans achieves consistent improvements across all metrics on image and video benchmarks, even handling challenging scenarios where object interaction regions are fully occluded by the human body.
Related papers
- Cognition Transferring and Decoupling for Text-supervised Egocentric Semantic Segmentation [17.35953923039954]
Egocentic Semantic (TESS) task aims to assign pixel-level categories to egocentric images weakly supervised by texts from image-level labels.<n>We propose a Cognition Transferring and Decoupling Network (CTDN) that first learns the egocentric wearer-object relations via correlating the image and text.
arXiv Detail & Related papers (2024-10-02T08:58:34Z) - INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding [10.787807888885888]
We propose INTeraction Relationship-aware weakly supervised Affordance grounding (INTRA)
Unlike prior arts, INTRA recasts this problem as representation learning to identify unique features of interactions through contrastive learning with exocentric images only.
Our method outperformed prior arts on diverse datasets such as AGD20K, IIT-AFF, CAD and UMD.
arXiv Detail & Related papers (2024-09-10T04:31:51Z) - EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views [51.53089073920215]
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception.
Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view.
We present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance.
arXiv Detail & Related papers (2024-05-22T14:03:48Z) - Put Myself in Your Shoes: Lifting the Egocentric Perspective from
Exocentric Videos [66.46812056962567]
Exocentric-to-egocentric cross-view translation aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective.
We propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation and a pixel-level hallucination.
arXiv Detail & Related papers (2024-03-11T01:00:00Z) - Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective [13.776455033015216]
We introduce a novel cross-view learning approach to action recognition.
First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer.
Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views.
arXiv Detail & Related papers (2023-05-25T04:14:49Z) - Grounded Affordance from Exocentric View [79.64064711636975]
Affordance grounding aims to locate objects' "action possibilities" regions.
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions.
Human has the ability that transforms the various exocentric interactions into invariant egocentric affordance.
arXiv Detail & Related papers (2022-08-28T10:32:47Z) - Learning Affordance Grounding from Exocentric Images [79.64064711636975]
Affordance grounding is a task to ground (i.e., localize) action possibility region in objects.
Human has the ability that transform the various exocentric interactions to invariant egocentric affordance.
This paper proposes a task of affordance grounding from exocentric view, i.e. given exocentric human-object interaction and egocentric object images.
arXiv Detail & Related papers (2022-03-18T12:29:06Z) - Image-to-image Mapping with Many Domains by Sparse Attribute Transfer [71.28847881318013]
Unsupervised image-to-image translation consists of learning a pair of mappings between two domains without known pairwise correspondences between points.
Current convention is to approach this task with cycle-consistent GANs.
We propose an alternate approach that directly restricts the generator to performing a simple sparse transformation in a latent layer.
arXiv Detail & Related papers (2020-06-23T19:52:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.