One-Shot Object Affordance Detection in the Wild
- URL: http://arxiv.org/abs/2108.03658v1
- Date: Sun, 8 Aug 2021 14:53:10 GMT
- Title: One-Shot Object Affordance Detection in the Wild
- Authors: Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, Dacheng Tao
- Abstract summary: Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
- Score: 76.46484684007706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Affordance detection refers to identifying the potential action possibilities
of objects in an image, which is a crucial ability for robot perception and
manipulation. To empower robots with this ability in unseen scenarios, we first
study the challenging one-shot affordance detection problem in this paper,
i.e., given a support image that depicts the action purpose, all objects in a
scene with the common affordance should be detected. To this end, we devise a
One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the
human action purpose and then transfers it to help detect the common affordance
from all candidate images. Through collaboration learning, OSAD-Net can capture
the common characteristics between objects having the same underlying
affordance and learn a good adaptation capability for perceiving unseen
affordances. Besides, we build a large-scale Purpose-driven Affordance Dataset
v2 (PADv2) by collecting and labeling 30k images from 39 affordance and 103
object categories. With complex scenes and rich annotations, our PADv2 dataset
can be used as a test bed to benchmark affordance detection methods and may
also facilitate downstream vision tasks, such as scene understanding, action
recognition, and robot manipulation. Specifically, we conducted comprehensive
experiments on PADv2 dataset by including 11 advanced models from several
related research fields. Experimental results demonstrate the superiority of
our model over previous representative ones in terms of both objective metrics
and visual quality. The benchmark suite is available at
https://github.com/lhc1224/OSAD Net.
Related papers
- Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images [11.217630579076237]
Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing.
We propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC)
Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects.
arXiv Detail & Related papers (2024-03-20T08:15:18Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge
Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously.
We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation.
Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z) - Phrase-Based Affordance Detection via Cyclic Bilateral Interaction [17.022853987801877]
We explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem.
We propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively.
Specifically, the presented CBCE-Net consists of a mutual guided vision-language module that updates the common features of vision and language in a progressive manner, and a cyclic interaction module (CIM) that facilitates the perception of possible interaction with objects in a cyclic manner.
arXiv Detail & Related papers (2022-02-24T13:02:27Z) - One-Shot Affordance Detection [0.0]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem.
We devise a One-Shot Affordance Detection (OS-AD) network that firstly estimates the purpose and then transfers it to help detect the common affordance.
arXiv Detail & Related papers (2021-06-28T14:22:52Z) - Uncertainty-aware Joint Salient Object and Camouflaged Object Detection [43.01556978979627]
We propose a paradigm of leveraging the contradictory information to enhance the detection ability of both salient object detection and camouflaged object detection.
We introduce a similarity measure module to explicitly model the contradicting attributes of these two tasks.
Considering the uncertainty of labeling in both tasks' datasets, we propose an adversarial learning network to achieve both higher order similarity measure and network confidence estimation.
arXiv Detail & Related papers (2021-04-06T16:05:10Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.