Described Object Detection: Liberating Object Detection with Flexible
Expressions
- URL: http://arxiv.org/abs/2307.12813v2
- Date: Wed, 11 Oct 2023 14:35:26 GMT
- Title: Described Object Detection: Liberating Object Detection with Flexible
Expressions
- Authors: Chi Xie, Zhao Zhang, Yixuan Wu, Feng Zhu, Rui Zhao, Shuang Liang
- Abstract summary: We advance Open-Vocabulary object Detection (OVD) and Referring Expression (REC) to Described Object Detection (DOD)
In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD.
This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission.
- Score: 19.392927971139652
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Detecting objects based on language information is a popular task that
includes Open-Vocabulary object Detection (OVD) and Referring Expression
Comprehension (REC). In this paper, we advance them to a more practical setting
called Described Object Detection (DOD) by expanding category names to flexible
language expressions for OVD and overcoming the limitation of REC only
grounding the pre-existing object. We establish the research foundation for DOD
by constructing a Description Detection Dataset ($D^3$). This dataset features
flexible language expressions, whether short category names or long
descriptions, and annotating all described objects on all images without
omission. By evaluating previous SOTA methods on $D^3$, we find some
troublemakers that fail current REC, OVD, and bi-functional methods. REC
methods struggle with confidence scores, rejecting negative instances, and
multi-target scenarios, while OVD methods face constraints with long and
complex descriptions. Recent bi-functional methods also do not work well on DOD
due to their separated training procedures and inference strategies for REC and
OVD tasks. Building upon the aforementioned findings, we propose a baseline
that largely improves REC methods by reconstructing the training data and
introducing a binary classification sub-task, outperforming existing methods.
Data and code are available at https://github.com/shikras/d-cube and related
works are tracked in
https://github.com/Charles-Xie/awesome-described-object-detection.
Related papers
- Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability [19.54008511592332]
In real-world applications, the target class concepts is often hard to describe in text.
There is a high demand for few-shot object detection (FSOD)
Can the benefits of OVD extend to FSOD for object classes that are difficult to describe in text?
arXiv Detail & Related papers (2024-10-20T06:59:35Z) - Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection [44.92009038111696]
Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes.
We propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task.
With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference.
arXiv Detail & Related papers (2024-07-12T02:34:11Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories.
Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection.
We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z) - Generative Region-Language Pretraining for Open-Ended Object Detection [55.42484781608621]
We propose a framework named GenerateU, which can detect dense objects and generate their names in a free-form way.
Our framework achieves comparable results to the open-vocabulary object detection method GLIP.
arXiv Detail & Related papers (2024-03-15T10:52:39Z) - The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding [8.448399308205266]
We introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and assign the correct fine-grained description to objects.
We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol.
arXiv Detail & Related papers (2023-11-29T10:40:52Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - InstructDET: Diversifying Referring Object Detection with Generalized
Instructions [39.36186258308405]
We propose a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions.
For one image, we produce tremendous instructions that refer to every single object and different combinations of multiple objects.
arXiv Detail & Related papers (2023-10-08T12:10:44Z) - What Makes Good Open-Vocabulary Detector: A Disassembling Perspective [6.623703413255309]
Open-vocabulary detection (OVD) is a new object detection paradigm, aiming to localize and recognize unseen objects defined by an unbounded vocabulary.
Previous works mainly focus on the open vocabulary classification part, with less attention on the localization part.
We show in this work that improving localization as well as cross-modal classification complement each other, and compose a good OVD detector jointly.
arXiv Detail & Related papers (2023-09-01T03:03:50Z) - Linear Object Detection in Document Images using Multiple Object
Tracking [58.720142291102135]
Linear objects convey substantial information about document structure.
Many approaches can recover some vector representation, but only one closed-source technique introduced in 1994.
We propose a framework for accurate instance segmentation of linear objects in document images using Multiple Object Tracking.
arXiv Detail & Related papers (2023-05-26T14:22:03Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.