CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
- URL: http://arxiv.org/abs/2309.01093v1
- Date: Sun, 3 Sep 2023 06:18:39 GMT
- Title: CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
- Authors: Jiajin Tang, Ge Zheng, Jingyi Yu, Sibei Yang
- Abstract summary: Task driven object detection aims to detect object instances suitable for affording a task in an image.
Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.
We propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task.
- Score: 42.2847114428716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Task driven object detection aims to detect object instances suitable for
affording a task in an image. Its challenge lies in object categories available
for the task being too diverse to be limited to a closed set of object
vocabulary for traditional object detection. Simply mapping categories and
visual features of common objects to the task cannot address the challenge. In
this paper, we propose to explore fundamental affordances rather than object
categories, i.e., common attributes that enable different objects to accomplish
the same task. Moreover, we propose a novel multi-level chain-of-thought
prompting (MLCoT) to extract the affordance knowledge from large language
models, which contains multi-level reasoning steps from task to object examples
to essential visual attributes with rationales. Furthermore, to fully exploit
knowledge to benefit object recognition and localization, we propose a
knowledge-conditional detection framework, namely CoTDet. It conditions the
detector from the knowledge to generate object queries and regress boxes.
Experimental results demonstrate that our CoTDet outperforms state-of-the-art
methods consistently and significantly (+15.6 box AP and +14.8 mask AP) and can
generate rationales for why objects are detected to afford the task.
Related papers
- Leverage Task Context for Object Affordance Ranking [57.59106517732223]
We build the first large-scale task-oriented affordance ranking dataset with 25 common tasks, over 50k images and more than 661k objects.
Results demonstrate the feasibility of the task context based affordance learning paradigm and the superiority of our model over state-of-the-art models in the fields of saliency ranking and multimodal object detection.
arXiv Detail & Related papers (2024-11-25T04:22:33Z) - Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories.
Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection.
We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z) - Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition [21.655278000690686]
We propose an end-to-end object-centric action recognition framework.
It simultaneously performs Detection And Interaction Reasoning in one stage.
We conduct experiments on two datasets, Something-Else and Ikea-Assembly.
arXiv Detail & Related papers (2024-04-18T05:06:12Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Universal Instance Perception as Object Discovery and Retrieval [90.96031157557806]
UNI reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm.
It can flexibly perceive different types of objects by simply changing the input prompts.
UNI shows superior performance on 20 challenging benchmarks from 10 instance-level tasks.
arXiv Detail & Related papers (2023-03-12T14:28:24Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Pix2seq: A Language Modeling Framework for Object Detection [12.788663431798588]
Pix2Seq is a simple and generic framework for object detection.
We train a neural net to perceive the image and generate the desired sequence.
Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out.
arXiv Detail & Related papers (2021-09-22T17:26:36Z) - Class-agnostic Object Detection [16.97782147401037]
We propose class-agnostic object detection as a new problem that focuses on detecting objects irrespective of their object-classes.
Specifically, the goal is to predict bounding boxes for all objects in an image but not their object-classes.
We propose training and evaluation protocols for benchmarking class-agnostic detectors to advance future research in this domain.
arXiv Detail & Related papers (2020-11-28T19:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.