One-Shot Affordance Detection
- URL: http://arxiv.org/abs/2106.14747v1
- Date: Mon, 28 Jun 2021 14:22:52 GMT
- Title: One-Shot Affordance Detection
- Authors: Hongchen Luo (1), Wei Zhai (1 and 3), Jing Zhang (2), Yang Cao (1) and
Dacheng Tao (3) ((1) University of Science and Technology of China, China,
(2) The University of Sydney, Australia, (3) JD Explore Academy, JD.com,
China)
- Abstract summary: Affordance detection refers to identifying the potential action possibilities of objects in an image.
To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem.
We devise a One-Shot Affordance Detection (OS-AD) network that firstly estimates the purpose and then transfers it to help detect the common affordance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Affordance detection refers to identifying the potential action possibilities
of objects in an image, which is an important ability for robot perception and
manipulation. To empower robots with this ability in unseen scenarios, we
consider the challenging one-shot affordance detection problem in this paper,
i.e., given a support image that depicts the action purpose, all objects in a
scene with the common affordance should be detected. To this end, we devise a
One-Shot Affordance Detection (OS-AD) network that firstly estimates the
purpose and then transfers it to help detect the common affordance from all
candidate images. Through collaboration learning, OS-AD can capture the common
characteristics between objects having the same underlying affordance and learn
a good adaptation capability for perceiving unseen affordances. Besides, we
build a Purpose-driven Affordance Dataset (PAD) by collecting and labeling 4k
images from 31 affordance and 72 object categories. Experimental results
demonstrate the superiority of our model over previous representative ones in
terms of both objective metrics and visual quality. The benchmark suite is at
ProjectPage.
Related papers
- Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection [42.2847114428716]
Task driven object detection aims to detect object instances suitable for affording a task in an image.
Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.
We propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task.
arXiv Detail & Related papers (2023-09-03T06:18:39Z) - Phrase-Based Affordance Detection via Cyclic Bilateral Interaction [17.022853987801877]
We explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem.
We propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively.
Specifically, the presented CBCE-Net consists of a mutual guided vision-language module that updates the common features of vision and language in a progressive manner, and a cyclic interaction module (CIM) that facilitates the perception of possible interaction with objects in a cyclic manner.
arXiv Detail & Related papers (2022-02-24T13:02:27Z) - Coordinate-Aligned Multi-Camera Collaboration for Active Multi-Object
Tracking [114.16306938870055]
We propose a coordinate-aligned multi-camera collaboration system for AMOT.
In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution.
Our system achieves a coverage of 71.88%, outperforming the baseline method by 8.9%.
arXiv Detail & Related papers (2022-02-22T13:28:40Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Text-driven object affordance for guiding grasp-type recognition in
multimodal robot teaching [18.529563816600607]
This study investigates how text-driven object affordance affects image-based grasp-type recognition in robot teaching.
They created labeled datasets of first-person hand images to examine the impact of object affordance on recognition performance.
arXiv Detail & Related papers (2021-02-27T17:03:32Z) - Few-shot Object Detection with Self-adaptive Attention Network for
Remote Sensing Images [11.938537194408669]
We propose a few-shot object detector which is designed for detecting novel objects provided with only a few examples.
In order to fit the object detection settings, our proposed few-shot detector concentrates on the relations that lie in the level of objects instead of the full image.
The experiments demonstrate the effectiveness of the proposed method in few-shot scenes.
arXiv Detail & Related papers (2020-09-26T13:44:58Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.