Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
- URL: http://arxiv.org/abs/2207.11169v1
- Date: Fri, 22 Jul 2022 16:13:22 GMT
- Title: Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark
- Authors: Kibok Lee, Hao Yang, Satyaki Chakraborty, Zhaowei Cai, Gurumurthy
Swaminathan, Avinash Ravichandran, Onkar Dabeer
- Abstract summary: Multi-dOmain Few-Shot Object Detection (MoFSOD) benchmark consists of 10 datasets from a wide range of domains.
We analyze the impacts of freezing layers, different architectures, and different pre-training datasets on FSOD performance.
- Score: 28.818423712485504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing works on few-shot object detection (FSOD) focus on a setting
where both pre-training and few-shot learning datasets are from a similar
domain. However, few-shot algorithms are important in multiple domains; hence
evaluation needs to reflect the broad applications. We propose a Multi-dOmain
Few-Shot Object Detection (MoFSOD) benchmark consisting of 10 datasets from a
wide range of domains to evaluate FSOD algorithms. We comprehensively analyze
the impacts of freezing layers, different architectures, and different
pre-training datasets on FSOD performance. Our empirical results show several
key factors that have not been explored in previous works: 1) contrary to
previous belief, on a multi-domain benchmark, fine-tuning (FT) is a strong
baseline for FSOD, performing on par or better than the state-of-the-art (SOTA)
algorithms; 2) utilizing FT as the baseline allows us to explore multiple
architectures, and we found them to have a significant impact on down-stream
few-shot tasks, even with similar pre-training performances; 3) by decoupling
pre-training and few-shot learning, MoFSOD allows us to explore the impact of
different pre-training datasets, and the right choice can boost the performance
of the down-stream tasks significantly. Based on these findings, we list
possible avenues of investigation for improving FSOD performance and propose
two simple modifications to existing algorithms that lead to SOTA performance
on the MoFSOD benchmark. The code is available at
https://github.com/amazon-research/few-shot-object-detection-benchmark.
Related papers
- Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples.
In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs.
We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z) - FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based
Sentiment Analysis [1.606149016749251]
Multi-domain aspect-based sentiment analysis (ABSA) seeks to capture fine-grained sentiment across diverse domains.
We propose a novel framework, Feature-aware In-context Learning for Multi-domain ABSA (FaiMA)
FaiMA is a feature-aware mechanism that facilitates adaptive learning in multi-domain ABSA tasks.
arXiv Detail & Related papers (2024-03-02T02:00:51Z) - BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video [58.71785546245467]
Multiple existing benchmarks involve tracking and segmenting objects in video.
There is little interaction between them due to the use of disparate benchmark datasets and metrics.
We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks.
All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
arXiv Detail & Related papers (2022-09-25T01:27:35Z) - Sylph: A Hypernetwork Framework for Incremental Few-shot Object
Detection [8.492340530784697]
We show that finetune-free iFSD can be highly effective when a large number of base categories with abundant data are available for meta-training.
We benchmark our model on both COCO and LVIS, reporting as high as $17%$ AP on the long-tail rare classes on LVIS.
arXiv Detail & Related papers (2022-03-25T20:39:00Z) - Integrated Multiscale Domain Adaptive YOLO [5.33024001730262]
We introduce a novel MultiScale Domain Adaptive YOLO (MS-DAYOLO) framework that employs multiple domain adaptation paths and corresponding domain classifiers at different scales of the recently introduced YOLOv4 object detector.
Our experiments show significant improvements in object detection performance when training YOLOv4 using the proposed MS-DAYOLO architectures and when tested on target data for autonomous driving applications.
arXiv Detail & Related papers (2022-02-07T21:30:53Z) - Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods.
In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others.
We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z) - Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time.
Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks.
We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z) - Real-Time Visual Object Tracking via Few-Shot Learning [107.39695680340877]
Visual Object Tracking (VOT) can be seen as an extended task of Few-Shot Learning (FSL)
We propose a two-stage framework that is capable of employing a large variety of FSL algorithms while presenting faster adaptation speed.
Experiments on the major benchmarks, VOT2018, OTB2015, NFS, UAV123, TrackingNet, and GOT-10k are conducted, demonstrating a desirable performance gain and a real-time speed.
arXiv Detail & Related papers (2021-03-18T10:02:03Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.