Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme
- URL: http://arxiv.org/abs/2103.14724v2
- Date: Tue, 30 Mar 2021 01:35:36 GMT
- Title: Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme
- Authors: Zhongjie Yu, Gaoang Wang, Lin Chen, Sebastian Raschka, and Jiebo Luo
- Abstract summary: We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
- Score: 70.45901040613015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different from static images, videos contain additional temporal and spatial
information for better object detection. However, it is costly to obtain a
large number of videos with bounding box annotations that are required for
supervised deep learning. Although humans can easily learn to recognize new
objects by watching only a few video clips, deep learning usually suffers from
overfitting. This leads to an important question: how to effectively learn a
video object detector from only a few labeled video clips? In this paper, we
study the new problem of few-shot learning for video object detection. We first
define the few-shot setting and create a new benchmark dataset for few-shot
video object detection derived from the widely used ImageNet VID dataset. We
employ a transfer-learning framework to effectively train the video object
detector on a large number of base-class objects and a few video clips of
novel-class objects. By analyzing the results of two methods under this
framework (Joint and Freeze) on our designed weak and strong base datasets, we
reveal insufficiency and overfitting problems. A simple but effective method,
called Thaw, is naturally developed to trade off the two problems and validate
our analysis.
Extensive experiments on our proposed benchmark datasets with different
scenarios demonstrate the effectiveness of our novel analysis in this new
few-shot video object detection problem.
Related papers
- FADE: A Dataset for Detecting Falling Objects around Buildings in Video [75.48118923174712]
Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert.
FADE contains 1,881 videos from 18 scenes, featuring 8 falling object categories, 4 weather conditions, and 4 video resolutions.
We develop a new object detection method called FADE-Net, which effectively leverages motion information.
arXiv Detail & Related papers (2024-08-11T11:43:56Z) - Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective.
Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z) - Uncertainty Aware Active Learning for Reconfiguration of Pre-trained
Deep Object-Detection Networks for New Target Domains [0.0]
Object detection is one of the most important and fundamental aspects of computer vision tasks.
To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format.
Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from.
In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem.
arXiv Detail & Related papers (2023-03-22T17:14:10Z) - Weakly Supervised Two-Stage Training Scheme for Deep Video Fight
Detection Model [0.0]
Fight detection in videos is an emerging deep learning application with today's prevalence of surveillance systems and streaming media.
Previous work has largely relied on action recognition techniques to tackle this problem.
We design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.
arXiv Detail & Related papers (2022-09-23T08:29:16Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Finding a Needle in a Haystack: Tiny Flying Object Detection in 4K
Videos using a Joint Detection-and-Tracking Approach [19.59528430884104]
We present a neural network model called the Recurrent Correlational Network, where detection and tracking are jointly performed.
In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements.
Our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on a bird image dataset.
arXiv Detail & Related papers (2021-05-18T03:22:03Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Performance of object recognition in wearable videos [9.669942356088377]
This work studies the problem of object detection and localization on videos captured by this type of camera.
We present a study of the well known YOLO architecture, that offers an excellent trade-off between accuracy and speed.
arXiv Detail & Related papers (2020-09-10T15:20:17Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.