Language-aware Multiple Datasets Detection Pretraining for DETRs
- URL: http://arxiv.org/abs/2304.03580v1
- Date: Fri, 7 Apr 2023 10:34:04 GMT
- Title: Language-aware Multiple Datasets Detection Pretraining for DETRs
- Authors: Jing Hao, Song Chen, Xiaodi Wang, Shumin Han
- Abstract summary: We propose a framework for utilizing Multiple datasets to pretrain DETR-like detectors, termed METR.
It converts the typical multi-classification in object detection into binary classification by introducing a pre-trained language model.
We show METR achieves extraordinary results on either multi-task joint training or the pretrain & finetune paradigm.
- Score: 4.939595148195813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretraining on large-scale datasets can boost the performance of object
detectors while the annotated datasets for object detection are hard to scale
up due to the high labor cost. What we possess are numerous isolated
filed-specific datasets, thus, it is appealing to jointly pretrain models
across aggregation of datasets to enhance data volume and diversity. In this
paper, we propose a strong framework for utilizing Multiple datasets to
pretrain DETR-like detectors, termed METR, without the need for manual label
spaces integration. It converts the typical multi-classification in object
detection into binary classification by introducing a pre-trained language
model. Specifically, we design a category extraction module for extracting
potential categories involved in an image and assign these categories into
different queries by language embeddings. Each query is only responsible for
predicting a class-specific object. Besides, to adapt our novel detection
paradigm, we propose a group bipartite matching strategy that limits the ground
truths to match queries assigned to the same category. Extensive experiments
demonstrate that METR achieves extraordinary results on either multi-task joint
training or the pretrain & finetune paradigm. Notably, our pre-trained models
have high flexible transferability and increase the performance upon various
DETR-like detectors on COCO val2017 benchmark. Codes will be available after
this paper is published.
Related papers
- Meta-learning Pathologies from Radiology Reports using Variance Aware
Prototypical Networks [3.464871689508835]
We propose a simple extension of the Prototypical Networks for few-shot text classification.
Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids.
arXiv Detail & Related papers (2022-10-22T05:22:29Z) - Detection Hub: Unifying Object Detection Datasets via Query Adaptation
on Language Embedding [137.3719377780593]
A new design (named Detection Hub) is dataset-aware and category-aligned.
It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets.
The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
arXiv Detail & Related papers (2022-06-07T17:59:44Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z) - Adaptive Prototypical Networks with Label Words and Joint Representation
Learning for Few-Shot Relation Classification [17.237331828747006]
This work focuses on few-shot relation classification (FSRC)
We propose an adaptive mixture mechanism to add label words to the representation of the class prototype.
Experiments have been conducted on FewRel under different few-shot (FS) settings.
arXiv Detail & Related papers (2021-01-10T11:25:42Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.