V3Det: Vast Vocabulary Visual Detection Dataset
- URL: http://arxiv.org/abs/2304.03752v2
- Date: Thu, 5 Oct 2023 12:18:14 GMT
- Title: V3Det: Vast Vocabulary Visual Detection Dataset
- Authors: Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin
Wang, Conghui He, Dahua Lin
- Abstract summary: V3Det is a vast vocabulary visual detection dataset with precisely annotated bounding boxes on massive images.
By offering a vast exploration space, V3Det enables extensive benchmarks on both vast and open vocabulary object detection.
- Score: 69.50942928928052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in detecting arbitrary objects in the real world are trained
and evaluated on object detection datasets with a relatively restricted
vocabulary. To facilitate the development of more general visual object
detection, we propose V3Det, a vast vocabulary visual detection dataset with
precisely annotated bounding boxes on massive images. V3Det has several
appealing properties: 1) Vast Vocabulary: It contains bounding boxes of objects
from 13,204 categories on real-world images, which is 10 times larger than the
existing large vocabulary object detection dataset, e.g., LVIS. 2) Hierarchical
Category Organization: The vast vocabulary of V3Det is organized by a
hierarchical category tree which annotates the inclusion relationship among
categories, encouraging the exploration of category relationships in vast and
open vocabulary object detection. 3) Rich Annotations: V3Det comprises
precisely annotated objects in 243k images and professional descriptions of
each category written by human experts and a powerful chatbot. By offering a
vast exploration space, V3Det enables extensive benchmarks on both vast and
open vocabulary object detection, leading to new observations, practices, and
insights for future research. It has the potential to serve as a cornerstone
dataset for developing more general visual perception systems. V3Det is
available at https://v3det.openxlab.org.cn/.
Related papers
- OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding [43.69535335079362]
Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed object classes.
Existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes.
We introduce a more challenging task called Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) to explore the open vocabulary problem beyond object classes.
arXiv Detail & Related papers (2024-08-20T17:31:48Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding [8.448399308205266]
We introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and assign the correct fine-grained description to objects.
We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol.
arXiv Detail & Related papers (2023-11-29T10:40:52Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Object2Scene: Putting Objects in Context for Open-Vocabulary 3D
Detection [24.871590175483096]
Point cloud-based open-vocabulary 3D object detection aims to detect 3D categories that do not have ground-truth annotations in the training set.
Previous approaches leverage large-scale richly-annotated image datasets as a bridge between 3D and category semantics.
We propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.
arXiv Detail & Related papers (2023-09-18T03:31:53Z) - Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - Learning Object-Language Alignments for Open-Vocabulary Object Detection [83.09560814244524]
We propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way.
arXiv Detail & Related papers (2022-11-27T14:47:31Z) - Exploiting Unlabeled Data with Vision and Language Models for Object
Detection [64.94365501586118]
Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets.
We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images.
We demonstrate the value of the generated pseudo labels in two specific tasks, open-vocabulary detection and semi-supervised object detection.
arXiv Detail & Related papers (2022-07-18T21:47:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.