Coherent Loss: A Generic Framework for Stable Video Segmentation
- URL: http://arxiv.org/abs/2010.13085v1
- Date: Sun, 25 Oct 2020 10:48:28 GMT
- Title: Coherent Loss: A Generic Framework for Stable Video Segmentation
- Authors: Mingyang Qian, Yi Fu, Xiao Tan, Yingying Li, Jinqing Qi, Huchuan Lu,
Shilei Wen, Errui Ding
- Abstract summary: We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
- Score: 103.78087255807482
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video segmentation approaches are of great importance for numerous vision
tasks especially in video manipulation for entertainment. Due to the challenges
associated with acquiring high-quality per-frame segmentation annotations and
large video datasets with different environments at scale, learning approaches
shows overall higher accuracy on test dataset but lack strict temporal
constraints to self-correct jittering artifacts in most practical applications.
We investigate how this jittering artifact degrades the visual quality of video
segmentation results and proposed a metric of temporal stability to numerically
evaluate it. In particular, we propose a Coherent Loss with a generic framework
to enhance the performance of a neural network against jittering artifacts,
which combines with high accuracy and high consistency. Equipped with our
method, existing video object/semantic segmentation approaches achieve a
significant improvement in term of more satisfactory visual quality on video
human dataset, which we provide for further research in this field, and also on
DAVIS and Cityscape.
Related papers
- SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis [52.050036778325094]
We introduce SALOVA: Segment-Augmented Video Assistant, a novel video-LLM framework designed to enhance the comprehension of lengthy video content.
We present a high-quality collection of 87.8K long videos, each densely captioned at the segment level to enable models to capture scene continuity and maintain rich context.
Our framework mitigates the limitations of current video-LMMs by allowing for precise identification and retrieval of relevant video segments in response to queries.
arXiv Detail & Related papers (2024-11-25T08:04:47Z) - Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding [61.89781979702939]
This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets.
Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations.
We introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods.
arXiv Detail & Related papers (2024-09-29T03:33:35Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Adaptive graph convolutional networks for weakly supervised anomaly
detection in videos [42.3118758940767]
We propose a weakly supervised adaptive graph convolutional network (WAGCN) to model the contextual relationships among video segments.
We fully consider the influence of other video segments on the current segment when generating the anomaly probability score for each segment.
arXiv Detail & Related papers (2022-02-14T06:31:34Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Temporally stable video segmentation without video annotations [6.184270985214255]
We introduce a method to adapt still image segmentation models to video in an unsupervised manner.
We verify that the consistency measure is well correlated with human judgement via a user study.
We observe improvements in the generated segmented videos with minimal loss of accuracy.
arXiv Detail & Related papers (2021-10-17T18:59:11Z) - The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video
Inpainting [43.90848669491335]
We propose the Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark, which consists of two contributions.
Our challenging benchmark enables more insightful analysis into video inpainting methods and serves as an invaluable diagnostic tool for the field.
arXiv Detail & Related papers (2021-05-11T20:13:53Z) - High Fidelity Interactive Video Segmentation Using Tensor Decomposition
Boundary Loss Convolutional Tessellations and Context Aware Skip Connections [0.0]
We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks.
Our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures.
Our work can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.
arXiv Detail & Related papers (2020-11-23T18:21:42Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.