Feature Flow: In-network Feature Flow Estimation for Video Object
Detection
- URL: http://arxiv.org/abs/2009.09660v2
- Date: Wed, 10 Nov 2021 06:58:57 GMT
- Title: Feature Flow: In-network Feature Flow Estimation for Video Object
Detection
- Authors: Ruibing Jin, Guosheng Lin, Changyun Wen, Jianliang Wang and Fayao Liu
- Abstract summary: Optical flow is widely used in computer vision tasks to provide pixel-level motion information.
A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset.
We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
- Score: 56.80974623192569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical flow, which expresses pixel displacement, is widely used in many
computer vision tasks to provide pixel-level motion information. However, with
the remarkable progress of the convolutional neural network, recent
state-of-the-art approaches are proposed to solve problems directly on
feature-level. Since the displacement of feature vector is not consistent to
the pixel displacement, a common approach is to:forward optical flow to a
neural network and fine-tune this network on the task dataset. With this
method,they expect the fine-tuned network to produce tensors encoding
feature-level motion information. In this paper, we rethink this de facto
paradigm and analyze its drawbacks in the video object detection task. To
mitigate these issues, we propose a novel network (IFF-Net) with an
\textbf{I}n-network \textbf{F}eature \textbf{F}low estimation module (IFF
module) for video object detection. Without resorting pre-training on any
additional dataset, our IFF module is able to directly produce \textbf{feature
flow} which indicates the feature displacement. Our IFF module consists of a
shallow module, which shares the features with the detection branches. This
compact design enables our IFF-Net to accurately detect objects, while
maintaining a fast inference speed. Furthermore, we propose a transformation
residual loss (TRL) based on \textit{self-supervision}, which further improves
the performance of our IFF-Net. Our IFF-Net outperforms existing methods and
sets a state-of-the-art performance on ImageNet VID.
Related papers
- DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection [38.882935730384965]
DREB-Net is an innovative object detection algorithm specifically designed for blurry images.
It addresses the particularities of blurry image object detection problem by incorporating a Blurry image Restoration Auxiliary Branch.
Experimental results indicate that DREB-Net can still effectively perform object detection tasks under motion blur in captured images.
arXiv Detail & Related papers (2024-10-23T12:32:20Z) - Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks [13.815116154370834]
We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network.
The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks.
Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
arXiv Detail & Related papers (2024-03-13T00:48:41Z) - Hierarchical Feature Alignment Network for Unsupervised Video Object
Segmentation [99.70336991366403]
We propose a concise, practical, and efficient architecture for appearance and motion feature alignment.
The proposed HFAN reaches a new state-of-the-art performance on DAVIS-16, achieving 88.7 $mathcalJ&mathcalF$ Mean, i.e., a relative improvement of 3.5% over the best published result.
arXiv Detail & Related papers (2022-07-18T10:10:14Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Implicit Motion-Compensated Network for Unsupervised Video Object
Segmentation [25.41427065435164]
Unsupervised video object segmentation (UVOS) aims at automatically separating the primary foreground object(s) from the background in a video sequence.
Existing UVOS methods either lack robustness when there are visually similar surroundings (appearance-based) or suffer from deterioration in the quality of their predictions because of dynamic background and inaccurate flow (flow-based)
We propose an implicit motion-compensated network (IMCNet) combining complementary cues ($textiti.e.$, appearance and motion) with aligned motion information from the adjacent frames to the current frame at the feature level.
arXiv Detail & Related papers (2022-04-06T13:03:59Z) - CE-FPN: Enhancing Channel Information for Object Detection [12.954675966833372]
Feature pyramid network (FPN) has been an effective framework to extract multi-scale features in object detection.
We present a novel channel enhancement network (CE-FPN) with three simple yet effective modules to alleviate these problems.
Our experiments show that CE-FPN achieves competitive performance compared to state-of-the-art FPN-based detectors on MS COCO benchmark.
arXiv Detail & Related papers (2021-03-19T05:51:53Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z) - iffDetector: Inference-aware Feature Filtering for Object Detection [70.8678270164057]
We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors.
IFF performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features.
IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead.
arXiv Detail & Related papers (2020-06-23T02:57:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.