Internal Video Inpainting by Implicit Long-range Propagation
- URL: http://arxiv.org/abs/2108.01912v1
- Date: Wed, 4 Aug 2021 08:56:28 GMT
- Title: Internal Video Inpainting by Implicit Long-range Propagation
- Authors: Hao Ouyang, Tengfei Wang, Qifeng Chen
- Abstract summary: We propose a novel framework for video inpainting by adopting an internal learning strategy.
We show that this can be achieved implicitly by fitting a convolutional neural network to the known region.
We extend the proposed method to another challenging task: learning to remove an object from a video giving a single object mask in only one frame in a 4K video.
- Score: 39.89676105875726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel framework for video inpainting by adopting an internal
learning strategy. Unlike previous methods that use optical flow for
cross-frame context propagation to inpaint unknown regions, we show that this
can be achieved implicitly by fitting a convolutional neural network to the
known region. Moreover, to handle challenging sequences with ambiguous
backgrounds or long-term occlusion, we design two regularization terms to
preserve high-frequency details and long-term temporal consistency. Extensive
experiments on the DAVIS dataset demonstrate that the proposed method achieves
state-of-the-art inpainting quality quantitatively and qualitatively. We
further extend the proposed method to another challenging task: learning to
remove an object from a video giving a single object mask in only one frame in
a 4K video.
Related papers
- Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI)
We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code.
Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z) - Semantically Consistent Video Inpainting with Conditional Diffusion Models [16.42354856518832]
We present a framework for solving problems with conditional video diffusion models.
We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context.
We devise a novel method for conditioning on the known pixels in incomplete frames.
arXiv Detail & Related papers (2024-04-30T23:49:26Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - Content-aware Warping for View Synthesis [110.54435867693203]
We propose content-aware warping, which adaptively learns the weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.
Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from two source views.
Experimental results on structured light field datasets with wide baselines and unstructured multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-01-22T11:35:05Z) - Coarse-Fine Networks for Temporal Activity Detection in Videos [45.03545172714305]
We introduce 'Co-Fine Networks', a two-stream architecture which benefits from different abstractions of temporal resolution to learn better video representations for long-term motion.
We show that our method can outperform the state-of-the-arts for action detection in public datasets with a significantly reduced compute and memory footprint.
arXiv Detail & Related papers (2021-03-01T20:48:01Z) - Deep Video Inpainting Detection [95.36819088529622]
Video inpainting detection localizes an inpainted region in a video both spatially and temporally.
VIDNet, Video Inpainting Detection Network, contains a two-stream encoder-decoder architecture with attention module.
arXiv Detail & Related papers (2021-01-26T20:53:49Z) - Short-Term and Long-Term Context Aggregation Network for Video
Inpainting [126.06302824297948]
Video inpainting aims to restore missing regions of a video and has many applications such as video editing and object removal.
We present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting.
Experiments show that it outperforms state-of-the-art methods with better inpainting results and fast inpainting speed.
arXiv Detail & Related papers (2020-09-12T03:50:56Z) - Learning Joint Spatial-Temporal Transformations for Video Inpainting [58.939131620135235]
We propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
We simultaneously fill missing regions in all input frames by self-attention, and propose to optimize STTN by a spatial-temporal adversarial loss.
arXiv Detail & Related papers (2020-07-20T16:35:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.