High Fidelity Interactive Video Segmentation Using Tensor Decomposition
Boundary Loss Convolutional Tessellations and Context Aware Skip Connections
- URL: http://arxiv.org/abs/2011.11602v1
- Date: Mon, 23 Nov 2020 18:21:42 GMT
- Title: High Fidelity Interactive Video Segmentation Using Tensor Decomposition
Boundary Loss Convolutional Tessellations and Context Aware Skip Connections
- Authors: Anthony D. Rhodes, Manan Goel
- Abstract summary: We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks.
Our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures.
Our work can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide a high fidelity deep learning algorithm (HyperSeg) for interactive
video segmentation tasks using a convolutional network with context-aware skip
connections, and compressed, hypercolumn image features combined with a
convolutional tessellation procedure. In order to maintain high output
fidelity, our model crucially processes and renders all image features in high
resolution, without utilizing downsampling or pooling procedures. We maintain
this consistent, high grade fidelity efficiently in our model chiefly through
two means: (1) We use a statistically-principled tensor decomposition procedure
to modulate the number of hypercolumn features and (2) We render these features
in their native resolution using a convolutional tessellation technique. For
improved pixel level segmentation results, we introduce a boundary loss
function; for improved temporal coherence in video data, we include temporal
image information in our model. Through experiments, we demonstrate the
improved accuracy of our model against baseline models for interactive
segmentation tasks using high resolution video data. We also introduce a
benchmark video segmentation dataset, the VFX Segmentation Dataset, which
contains over 27,046 high resolution video frames, including greenscreen and
various composited scenes with corresponding, hand crafted, pixel level
segmentations. Our work presents an extension to improvement to state of the
art segmentation fidelity with high resolution data and can be used across a
broad range of application domains, including VFX pipelines and medical imaging
disciplines.
Related papers
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models [89.79067761383855]
Vchitect-2.0 is a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation.
By introducing a novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated video frames.
To overcome memory and computational bottlenecks, we propose a Memory-efficient Training framework.
arXiv Detail & Related papers (2025-01-14T21:53:11Z) - Elevating Flow-Guided Video Inpainting with Reference Generation [50.03502211226332]
Video inpainting (VI) is a challenging task that requires effective propagation of observable content across frames while simultaneously generating new content not present in the original video.
We propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm.
Our method not only significantly enhances frame-level quality for object removal but also synthesizes new content in the missing areas based on user-provided text prompts.
arXiv Detail & Related papers (2024-12-12T06:13:00Z) - Transforming Static Images Using Generative Models for Video Salient Object Detection [15.701293552584863]
We show that image-to-video diffusion models can generate realistic transformations of static images while understanding the contextual relationships between image components.
This ability allows the model to generate plausible optical flows, preserving semantic integrity while reflecting the independent motion of scene elements.
Our approach achieves state-of-the-art performance across all public benchmark datasets, outperforming existing approaches.
arXiv Detail & Related papers (2024-11-21T09:41:33Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Adaptive Compact Attention For Few-shot Video-to-video Translation [13.535988102579918]
We introduce a novel adaptive compact attention mechanism to efficiently extract contextual features jointly from multiple reference images.
Our core idea is to extract compact basis sets from all the reference images as higher-level representations.
We extensively evaluate our method on a large-scale talking-head video dataset and a human dancing dataset.
arXiv Detail & Related papers (2020-11-30T11:19:12Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.