Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding
- URL: http://arxiv.org/abs/2206.03484v2
- Date: Wed, 29 Mar 2023 18:00:08 GMT
- Title: Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding
- Authors: Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong
Chen, Mengchen Liu, Jianfeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang
- Abstract summary: A new design (named Detection Hub) is dataset-aware and category-aligned.
It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets.
The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
- Score: 137.3719377780593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Combining multiple datasets enables performance boost on many computer vision
tasks. But similar trend has not been witnessed in object detection when
combining multiple datasets due to two inconsistencies among detection
datasets: taxonomy difference and domain gap. In this paper, we address these
challenges by a new design (named Detection Hub) that is dataset-aware and
category-aligned. It not only mitigates the dataset inconsistency but also
provides coherent guidance for the detector to learn across multiple datasets.
In particular, the dataset-aware design is achieved by learning a dataset
embedding that is used to adapt object queries as well as convolutional kernels
in detection heads. The categories across datasets are semantically aligned
into a unified space by replacing one-hot category representations with word
embedding and leveraging the semantic coherence of language embedding.
Detection Hub fulfills the benefits of large data on object detection.
Experiments demonstrate that joint training on multiple datasets achieves
significant performance gains over training on each dataset alone. Detection
Hub further achieves SoTA performance on UODB benchmark with wide variety of
datasets.
Related papers
- Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection"
We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Language-aware Multiple Datasets Detection Pretraining for DETRs [4.939595148195813]
We propose a framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR.
It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model.
We show METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm.
arXiv Detail & Related papers (2023-04-07T10:34:04Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Parsing with Pretrained Language Models, Multiple Datasets, and Dataset
Embeddings [13.097523786733872]
We compare two methods to embed datasets in a transformer-based multilingual dependency.
We confirm that performance increases are highest for small datasets and datasets with a low baseline score.
We show that training on the combination of all datasets performs similarly to designing smaller clusters based on language-relatedness.
arXiv Detail & Related papers (2021-12-07T10:47:07Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z) - Self-supervised Robust Object Detectors from Partially Labelled Datasets [3.1669406516464007]
merging datasets allows us to train one integrated object detector, instead of training several ones.
We propose a training framework to overcome missing-label challenge of the merged datasets.
We evaluate our proposed framework for training Yolo on a simulated merged dataset with missing rate $approx!48%$ using VOC2012 and VOC2007.
arXiv Detail & Related papers (2020-05-23T15:18:20Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.