FFAVOD: Feature Fusion Architecture for Video Object Detection
- URL: http://arxiv.org/abs/2109.07298v1
- Date: Wed, 15 Sep 2021 13:53:21 GMT
- Title: FFAVOD: Feature Fusion Architecture for Video Object Detection
- Authors: Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier,
Maguelonne H\'eritier
- Abstract summary: We propose FFAVOD, standing for feature fusion architecture for video object detection.
We first introduce a novel video object detection architecture that allows a network to share feature maps between nearby frames.
We show that using the proposed architecture and the fusion module can improve the performance of three base object detectors on two object detection benchmarks containing sequences of moving road users.
- Score: 11.365829102707014
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A significant amount of redundancy exists between consecutive frames of a
video. Object detectors typically produce detections for one image at a time,
without any capabilities for taking advantage of this redundancy. Meanwhile,
many applications for object detection work with videos, including intelligent
transportation systems, advanced driver assistance systems and video
surveillance. Our work aims at taking advantage of the similarity between video
frames to produce better detections. We propose FFAVOD, standing for feature
fusion architecture for video object detection. We first introduce a novel
video object detection architecture that allows a network to share feature maps
between nearby frames. Second, we propose a feature fusion module that learns
to merge feature maps to enhance them. We show that using the proposed
architecture and the fusion module can improve the performance of three base
object detectors on two object detection benchmarks containing sequences of
moving road users. Additionally, to further increase performance, we propose an
improvement to the SpotNet attention module. Using our architecture on the
improved SpotNet detector, we obtain the state-of-the-art performance on the
UA-DETRAC public benchmark as well as on the UAVDT dataset. Code is available
at https://github.com/hu64/FFAVOD.
Related papers
- STF: Spatio-Temporal Fusion Module for Improving Video Object Detection [7.213855322671065]
Consive frames in a video contain redundancy, but they may also contain complementary information for the detection task.
We propose a-temporal fusion framework (STF) to leverage this complementary information.
The proposed-temporal fusion module leads to improved detection performance compared to baseline object detectors.
arXiv Detail & Related papers (2024-02-16T15:19:39Z) - Camouflaged Object Detection with Feature Grafting and Distractor Aware [9.791590363932519]
We propose a novel Feature Grafting and Distractor Aware network (FDNet) to handle the Camouflaged Object Detection task.
Specifically, we use CNN and Transformer to encode multi-scale images in parallel.
A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map.
arXiv Detail & Related papers (2023-07-08T09:37:08Z) - Memory Maps for Video Object Detection and Tracking on UAVs [14.573513188682183]
This paper introduces a novel approach to video object detection and tracking on Unmanned Aerial Vehicles (UAVs)
By incorporating metadata, the proposed approach creates a memory map of object locations in actual world coordinates.
We use this representation to boost confidences, resulting in improved performance for several temporal computer vision tasks.
arXiv Detail & Related papers (2023-03-06T21:29:45Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - A Unified Transformer Framework for Group-based Segmentation:
Co-Segmentation, Co-Saliency Detection and Video Salient Object Detection [59.21990697929617]
Humans tend to mine objects by learning from a group of images or several frames of video since we live in a dynamic world.
Previous approaches design different networks on similar tasks separately, and they are difficult to apply to each other.
We introduce a unified framework to tackle these issues, term as UFO (UnifiedObject Framework for Co-Object Framework)
arXiv Detail & Related papers (2022-03-09T13:35:19Z) - Recent Trends in 2D Object Detection and Applications in Video Event
Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning.
We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z) - Single Shot Video Object Detector [215.06904478667337]
Single Shot Video Object Detector (SSVD) is a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos.
For $448 times 448$ input, SSVD achieves 79.2% mAP on ImageNet VID dataset.
arXiv Detail & Related papers (2020-07-07T15:36:26Z) - RN-VID: A Feature Fusion Architecture for Video Object Detection [10.667492516216889]
We propose RN-VID (standing for RetinaNet-VIDeo), a novel approach to video object detection.
First, we propose a new architecture that allows the usage of information from nearby frames to enhance feature maps.
Second, we propose a novel module to merge feature maps of same dimensions using re-ordering of channels and 1 x 1 convolutions.
arXiv Detail & Related papers (2020-03-24T14:54:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.