High Fidelity Interactive Video Segmentation Using Tensor Decomposition
Boundary Loss Convolutional Tessellations and Context Aware Skip Connections
- URL: http://arxiv.org/abs/2011.11602v1
- Date: Mon, 23 Nov 2020 18:21:42 GMT
- Title: High Fidelity Interactive Video Segmentation Using Tensor Decomposition
Boundary Loss Convolutional Tessellations and Context Aware Skip Connections
- Authors: Anthony D. Rhodes, Manan Goel
- Abstract summary: We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks.
Our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures.
Our work can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide a high fidelity deep learning algorithm (HyperSeg) for interactive
video segmentation tasks using a convolutional network with context-aware skip
connections, and compressed, hypercolumn image features combined with a
convolutional tessellation procedure. In order to maintain high output
fidelity, our model crucially processes and renders all image features in high
resolution, without utilizing downsampling or pooling procedures. We maintain
this consistent, high grade fidelity efficiently in our model chiefly through
two means: (1) We use a statistically-principled tensor decomposition procedure
to modulate the number of hypercolumn features and (2) We render these features
in their native resolution using a convolutional tessellation technique. For
improved pixel level segmentation results, we introduce a boundary loss
function; for improved temporal coherence in video data, we include temporal
image information in our model. Through experiments, we demonstrate the
improved accuracy of our model against baseline models for interactive
segmentation tasks using high resolution video data. We also introduce a
benchmark video segmentation dataset, the VFX Segmentation Dataset, which
contains over 27,046 high resolution video frames, including greenscreen and
various composited scenes with corresponding, hand crafted, pixel level
segmentations. Our work presents an extension to improvement to state of the
art segmentation fidelity with high resolution data and can be used across a
broad range of application domains, including VFX pipelines and medical imaging
disciplines.
Related papers
- Transforming Static Images Using Generative Models for Video Salient Object Detection [15.701293552584863]
We show that image-to-video diffusion models can generate realistic transformations of static images while understanding the contextual relationships between image components.
This ability allows the model to generate plausible optical flows, preserving semantic integrity while reflecting the independent motion of scene elements.
Our approach achieves state-of-the-art performance across all public benchmark datasets, outperforming existing approaches.
arXiv Detail & Related papers (2024-11-21T09:41:33Z) - I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
Models [54.99771394322512]
Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models.
It still challenges encounters in terms of semantic accuracy, clarity, and continuity-temporal continuity.
We propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors.
I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos.
arXiv Detail & Related papers (2023-11-07T17:16:06Z) - Video Captioning with Aggregated Features Based on Dual Graphs and Gated
Fusion [6.096411752534632]
The application of video captioning models aims at translating content of videos by using accurate natural language.
Existing methods often fail in generating sufficient feature representations of video content.
We propose a video captioning model based on dual graphs and gated fusion.
arXiv Detail & Related papers (2023-08-13T05:18:08Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images.
Our core idea is to extract compact basis sets from all the reference images as higher-level representations.
We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.