Related papers: Multi-label affordance mapping from egocentric vision

Multi-label affordance mapping from egocentric vision

URL: http://arxiv.org/abs/2309.02120v1
Date: Tue, 5 Sep 2023 10:56:23 GMT
Title: Multi-label affordance mapping from egocentric vision
Authors: Lorenzo Mur-Labadia, Jose J. Guerrero and Ruben Martinez-Cantin
Abstract summary: We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos. We show how our metric representation can be exploited for build a map of interaction hotspots.
Score: 3.683202928838613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate affordance detection and segmentation with pixel precision is an important piece in many complex systems based on interactions, such as robots and assitive devices. We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos of interactions using a 3D map of the environment providing pixel level precision for the affordance location. We use this method to build the largest and most complete dataset on affordances based on the EPIC-Kitchen dataset, EPIC-Aff, which provides interaction-grounded, multi-label, metric and spatial affordance annotations. Then, we propose a new approach to affordance segmentation based on multi-label detection which enables multiple affordances to co-exists in the same space, for example if they are associated with the same object. We present several strategies of multi-label detection using several segmentation architectures. The experimental results highlight the importance of the multi-label detection. Finally, we show how our metric representation can be exploited for build a map of interaction hotspots in spatial action-centric zones and use that representation to perform a task-oriented navigation.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation [14.144097766150397]
We present a dataset called Multi-target and Multi-granularity Reasoning (MMR) MMR comprises 194K complex and implicit instructions that consider multi-target, object-level, and part-level aspects. We propose a straightforward yet effective framework for multi-target, object-level, and part-level reasoning segmentation.
arXiv Detail & Related papers (2025-03-18T04:23:09Z)
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet. In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships. Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z)
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding. This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z)
Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z)
M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection [22.60675416709486]
M$3$Net is an attention network for Salient Object Detection. Cross-attention approach to achieve the interaction between multilevel features. Mixed Attention Block aims at modeling context at both global and local levels. Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
arXiv Detail & Related papers (2023-09-15T12:46:14Z)
Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition [24.406654146411682]
Vision Transformer (ViT) is the research base for this paper. Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images. We propose a weakly supervised object localization-based approach to extract multi-scale local features.
arXiv Detail & Related papers (2022-04-22T14:38:40Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation. We introduce a novel approach for more accurate and efficient unseen-temporal segmentation. We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels. Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z)
Few-shot 3D Point Cloud Semantic Segmentation [138.80825169240302]
We propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method. Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings.
arXiv Detail & Related papers (2020-06-22T08:05:25Z)
Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder [5.371825910267909]
We analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism.
arXiv Detail & Related papers (2020-01-04T08:55:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.