Global Spectral Filter Memory Network for Video Object Segmentation
- URL: http://arxiv.org/abs/2210.05567v2
- Date: Wed, 12 Oct 2022 04:50:00 GMT
- Title: Global Spectral Filter Memory Network for Video Object Segmentation
- Authors: Yong Liu, Ran Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong
Tang, Yujiu Yang
- Abstract summary: This paper studies semi-supervised video object segmentation through boosting intra-frame interaction.
We propose Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain.
- Score: 33.42697528492191
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies semi-supervised video object segmentation through boosting
intra-frame interaction. Recent memory network-based methods focus on
exploiting inter-frame temporal reference while paying little attention to
intra-frame spatial dependency. Specifically, these segmentation model tends to
be susceptible to interference from unrelated nontarget objects in a certain
frame. To this end, we propose Global Spectral Filter Memory network (GSFM),
which improves intra-frame interaction through learning long-term spatial
dependencies in the spectral domain. The key components of GSFM is 2D (inverse)
discrete Fourier transform for spatial information mixing. Besides, we
empirically find low frequency feature should be enhanced in encoder (backbone)
while high frequency for decoder (segmentation head). We attribute this to
semantic information extracting role for encoder and fine-grained details
highlighting role for decoder. Thus, Low (High) Frequency Module is proposed to
fit this circumstance. Extensive experiments on the popular DAVIS and
YouTube-VOS benchmarks demonstrate that GSFM noticeably outperforms the
baseline method and achieves state-of-the-art performance. Besides, extensive
analysis shows that the proposed modules are reasonable and of great
generalization ability. Our source code is available at
https://github.com/workforai/GSFM.
Related papers
- Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution [6.673262517388075]
Pubic symphysis-fetal head segmentation in transperineal ultrasound images plays a critical role for the assessment of fetal head descent and progression.
We introduce a dynamic, query-aware sparse attention mechanism for ultrasound image segmentation.
We propose a novel method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task.
arXiv Detail & Related papers (2024-10-14T10:14:04Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos [107.96514633713034]
We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
arXiv Detail & Related papers (2023-09-09T07:00:10Z) - Spatial-information Guided Adaptive Context-aware Network for Efficient
RGB-D Semantic Segmentation [9.198120596225968]
We propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm.
Experimental results on NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time, and parameters than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-11T09:02:03Z) - Spectrum-guided Multi-granularity Referring Video Object Segmentation [56.95836951559529]
Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features.
This causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation.
We propose a Spectrum-guided Multi-granularity approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks.
arXiv Detail & Related papers (2023-07-25T14:35:25Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation [6.744210626403423]
This paper presents a new model to achieve a trade-off between accuracy/speed for real-time road scene semantic segmentation.
Specifically, we proposed a lightweight model named Scale-aware Strip Attention Guided Feature Pyramid Network (S$2$-FPN)
Our network consists of three main modules: Attention Pyramid Fusion (APF) module, Scale-aware Strip Attention Module (SSAM), and Global Feature Upsample (GFU) module.
arXiv Detail & Related papers (2022-06-15T05:02:49Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - iffDetector: Inference-aware Feature Filtering for Object Detection [70.8678270164057]
We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors.
IFF performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features.
IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead.
arXiv Detail & Related papers (2020-06-23T02:57:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.