Dynamic Object Queries for Transformer-based Incremental Object Detection
- URL: http://arxiv.org/abs/2407.21687v2
- Date: Tue, 27 Aug 2024 12:03:00 GMT
- Title: Dynamic Object Queries for Transformer-based Incremental Object Detection
- Authors: Jichuan Zhang, Wei Li, Shuang Cheng, Ya-Li Li, Shengjin Wang,
- Abstract summary: Incremental object detection aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones.
Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay.
We propose DyQ-DETR, which incrementally expands the model representation ability to achieve stability-plasticity tradeoffs.
- Score: 45.41291377837515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incremental object detection (IOD) aims to sequentially learn new classes, while maintaining the capability to locate and identify old ones. As the training data arrives with annotations only with new classes, IOD suffers from catastrophic forgetting. Prior methodologies mainly tackle the forgetting issue through knowledge distillation and exemplar replay, ignoring the conflict between limited model capacity and increasing knowledge. In this paper, we explore \textit{dynamic object queries} for incremental object detection built on Transformer architecture. We propose the \textbf{Dy}namic object \textbf{Q}uery-based \textbf{DE}tection \textbf{TR}ansformer (DyQ-DETR), which incrementally expands the model representation ability to achieve stability-plasticity tradeoff. First, a new set of learnable object queries are fed into the decoder to represent new classes. These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well. Second, we propose the isolated bipartite matching for object queries in different phases, based on disentangled self-attention. The interaction among the object queries at different phases is eliminated to reduce inter-class confusion. Thanks to the separate supervision and computation over object queries, we further present the risk-balanced partial calibration for effective exemplar replay. Extensive experiments demonstrate that DyQ-DETR significantly surpasses the state-of-the-art methods, with limited parameter overhead. Code will be made publicly available.
Related papers
- A Modern Take on Visual Relationship Reasoning for Grasp Planning [10.543168383800532]
We present a modern take on visual relational reasoning for grasp planning.
We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories.
We also propose D3G, a new end-to-end transformer-based dependency graph generation model.
arXiv Detail & Related papers (2024-09-03T16:30:48Z) - DualTeacher: Bridging Coexistence of Unlabelled Classes for
Semi-supervised Incremental Object Detection [53.8061502411777]
In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively.
Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data.
We consider a more realistic setting named semi-supervised IOD (SSIOD), where the object detector needs to learn new classes incrementally from a few labelled data and massive unlabelled data.
arXiv Detail & Related papers (2023-12-13T10:46:14Z) - TIDE: Test Time Few Shot Object Detection [11.036762620105383]
Few-shot object detection (FSOD) aims to extract semantic knowledge from limited object instances of novel categories within a target domain.
Recent advances in FSOD focus on fine-tuning the base model based on a few objects via meta-learning or data augmentation.
We formalize a novel FSOD task, referred to as Test TIme Few Shot DEtection (TIDE), where the model is un-tuned in the configuration procedure.
arXiv Detail & Related papers (2023-11-30T09:00:44Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - Multi-View Correlation Distillation for Incremental Object Detection [12.536640582318949]
We propose a novel textbfMulti-textbfView textbfCorrelation textbfDistillation (MVCD) based incremental object detection method.
arXiv Detail & Related papers (2021-07-05T04:36:33Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.