Decoupled DETR For Few-shot Object Detection
- URL: http://arxiv.org/abs/2311.11570v1
- Date: Mon, 20 Nov 2023 07:10:39 GMT
- Title: Decoupled DETR For Few-shot Object Detection
- Authors: Zeyu Shangguan, Lian Huai, Tong Liu, Xingqun Jiang
- Abstract summary: We improve the FSOD model to address the severe issue of sample imbalance and weak feature propagation.
We build a unified decoder module that could dynamically fuse the decoder layers as the output feature.
Our results indicate that our proposed module could achieve stable improvements of 5% to 10% in both fine-tuning and meta-learning paradigms.
- Score: 4.520231308678286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot object detection (FSOD), an efficient method for addressing the
severe data-hungry problem, has been extensively discussed. Current works have
significantly advanced the problem in terms of model and data. However, the
overall performance of most FSOD methods still does not fulfill the desired
accuracy. In this paper we improve the FSOD model to address the severe issue
of sample imbalance and weak feature propagation. To alleviate modeling bias
from data-sufficient base classes, we examine the effect of decoupling the
parameters for classes with sufficient data and classes with few samples in
various ways. We design a base-novel categories decoupled DETR (DeDETR) for
FSOD. We also explore various types of skip connection between the encoder and
decoder for DETR. Besides, we notice that the best outputs could come from the
intermediate layer of the decoder instead of the last layer; therefore, we
build a unified decoder module that could dynamically fuse the decoder layers
as the output feature. We evaluate our model on commonly used datasets such as
PASCAL VOC and MSCOCO. Our results indicate that our proposed module could
achieve stable improvements of 5% to 10% in both fine-tuning and meta-learning
paradigms and has outperformed the highest score in recent works.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - FissionFusion: Fast Geometric Generation and Hierarchical Souping for Medical Image Analysis [0.7751705157998379]
The scarcity of well-annotated medical datasets requires leveraging transfer learning from broader datasets like ImageNet or pre-trained models like CLIP.
Model soups averages multiple fine-tuned models aiming to improve performance on In-Domain (ID) tasks and enhance robustness against Out-of-Distribution (OOD) datasets.
We propose a hierarchical merging approach that involves local and global aggregation of models at various levels.
arXiv Detail & Related papers (2024-03-20T06:48:48Z) - Staged Depthwise Correlation and Feature Fusion for Siamese Object
Tracking [0.6827423171182154]
We propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking.
We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets.
For comprehensive evaluations of performance, we implement our tracker on the popular benchmarks, including OTB100, VOT2018 and LaSOT.
arXiv Detail & Related papers (2023-10-15T06:04:42Z) - LegoNet: A Fast and Exact Unlearning Architecture [59.49058450583149]
Machine unlearning aims to erase the impact of specific training samples upon deleted requests from a trained model.
We present a novel network, namely textitLegoNet, which adopts the framework of fixed encoder + multiple adapters''
We show that LegoNet accomplishes fast and exact unlearning while maintaining acceptable performance, synthetically outperforming unlearning baselines.
arXiv Detail & Related papers (2022-10-28T09:53:05Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Learning Disentangled Latent Factors from Paired Data in Cross-Modal
Retrieval: An Implicit Identifiable VAE Approach [33.61751393224223]
We deal with the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval.
We propose a novel idea of the implicit decoder, which completely removes the ambient data decoding module from a latent variable model.
Our model is shown to identify the factors accurately, significantly outperforming conventional encoder-decoder latent variable models.
arXiv Detail & Related papers (2020-12-01T17:47:50Z) - Point Transformer for Shape Classification and Retrieval of 3D and ALS
Roof PointClouds [3.3744638598036123]
This paper proposes a fully attentional model - em Point Transformer, for deriving a rich point cloud representation.
The model's shape classification and retrieval performance are evaluated on a large-scale urban dataset - RoofN3D and a standard benchmark dataset ModelNet40.
The proposed method outperforms other state-of-the-art models in the RoofN3D dataset, gives competitive results in the ModelNet40 benchmark, and showcases high robustness to various unseen point corruptions.
arXiv Detail & Related papers (2020-11-08T08:11:02Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.