Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings
- URL: http://arxiv.org/abs/2304.05989v1
- Date: Thu, 30 Mar 2023 15:04:04 GMT
- Title: Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings
- Authors: Alexia Toumpa and Anthony G. Cohn
- Abstract summary: Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks.
We address the problem of affordance categorization for class-agnostic objects with an open set of interactions.
A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs.
- Score: 6.371828910727037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Acquiring knowledge about object interactions and affordances can facilitate
scene understanding and human-robot collaboration tasks. As humans tend to use
objects in many different ways depending on the scene and the objects'
availability, learning object affordances in everyday-life scenarios is a
challenging task, particularly in the presence of an open set of interactions
and objects. We address the problem of affordance categorization for
class-agnostic objects with an open set of interactions; we achieve this by
learning similarities between object interactions in an unsupervised way and
thus inducing clusters of object affordances. A novel depth-informed
qualitative spatial representation is proposed for the construction of Activity
Graphs (AGs), which abstract from the continuous representation of
spatio-temporal interactions in RGB-D videos. These AGs are clustered to obtain
groups of objects with similar affordances. Our experiments in a real-world
scenario demonstrate that our method learns to create object affordance
clusters with a high V-measure even in cluttered scenes. The proposed approach
handles object occlusions by capturing effectively possible interactions and
without imposing any object or scene constraints.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding.
Our approach leverages both object patterns and contextual cues to produce robust features.
Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z) - Learning Environment-Aware Affordance for 3D Articulated Object
Manipulation under Occlusions [9.400505355134728]
We propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints.
We introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations.
arXiv Detail & Related papers (2023-09-14T08:24:32Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - Discovering a Variety of Objects in Spatio-Temporal Human-Object
Interactions [45.92485321148352]
In daily HOIs, humans often interact with a variety of objects, e.g., holding and touching dozens of household items in cleaning.
Here, we introduce a new benchmark based on AVA: Discoveringed Objects (DIO) including 51 interactions and 1,000+ objects.
An ST-HOI learning task is proposed expecting vision systems to track human actors, detect interactions and simultaneously discover objects.
arXiv Detail & Related papers (2022-11-14T16:33:54Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.