Learning Discriminative Feature with CRF for Unsupervised Video Object
Segmentation
- URL: http://arxiv.org/abs/2008.01270v1
- Date: Tue, 4 Aug 2020 01:53:56 GMT
- Title: Learning Discriminative Feature with CRF for Unsupervised Video Object
Segmentation
- Authors: Mingmin Zhen, Shiwei Li, Lei Zhou, Jiaxiang Shang, Haoan Feng, Tian
Fang, Long Quan
- Abstract summary: We introduce discriminative feature network (DFNet) to address the unsupervised video object segmentation task.
DFNet outperforms state-of-the-art methods by a large margin with a mean IoU score of 83.4%.
DFNet is also applied to the image object co-segmentation task.
- Score: 34.1031534327244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel network, called discriminative feature
network (DFNet), to address the unsupervised video object segmentation task. To
capture the inherent correlation among video frames, we learn discriminative
features (D-features) from the input images that reveal feature distribution
from a global perspective. The D-features are then used to establish
correspondence with all features of test image under conditional random field
(CRF) formulation, which is leveraged to enforce consistency between pixels.
The experiments verify that DFNet outperforms state-of-the-art methods by a
large margin with a mean IoU score of 83.4% and ranks first on the DAVIS-2016
leaderboard while using much fewer parameters and achieving much more efficient
performance in the inference phase. We further evaluate DFNet on the FBMS
dataset and the video saliency dataset ViSal, reaching a new state-of-the-art.
To further demonstrate the generalizability of our framework, DFNet is also
applied to the image object co-segmentation task. We perform experiments on a
challenging dataset PASCAL-VOC and observe the superiority of DFNet. The
thorough experiments verify that DFNet is able to capture and mine the
underlying relations of images and discover the common foreground objects.
Related papers
- DDU-Net: A Domain Decomposition-based CNN for High-Resolution Image Segmentation on Multiple GPUs [46.873264197900916]
A domain decomposition-based U-Net architecture is introduced, which partitions input images into non-overlapping patches.
A communication network is added to facilitate inter-patch information exchange to enhance the understanding of spatial context.
Results show that the approach achieves a $2-3,%$ higher intersection over union (IoU) score compared to the same network without inter-patch communication.
arXiv Detail & Related papers (2024-07-31T01:07:21Z) - NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex
Real-World Scenes [80.59831861186227]
This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes.
Our framework, called NeRF with Self-supervised Object NeRF-SOS, encourages NeRF models to distill compact geometry-aware segmentation clusters.
It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts.
arXiv Detail & Related papers (2022-09-19T06:03:17Z) - ViGAT: Bottom-up event recognition and explanation in video using
factorized graph attention network [8.395400675921515]
ViGAT is a pure-attention bottom-up approach to derive object and frame features.
A head network is proposed to process these features for the task of event recognition and explanation in video.
A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets.
arXiv Detail & Related papers (2022-07-20T14:12:05Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - Box Supervised Video Segmentation Proposal Network [3.384080569028146]
We propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties.
The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9%.
We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.
arXiv Detail & Related papers (2022-02-14T20:38:28Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Feature Flow: In-network Feature Flow Estimation for Video Object
Detection [56.80974623192569]
Optical flow is widely used in computer vision tasks to provide pixel-level motion information.
A common approach is to:forward optical flow to a neural network and fine-tune this network on the task dataset.
We propose a novel network (IFF-Net) with an textbfIn-network textbfFeature textbfFlow estimation module for video object detection.
arXiv Detail & Related papers (2020-09-21T07:55:50Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.