Learning Temporally and Semantically Consistent Unpaired Video-to-video
Translation Through Pseudo-Supervision From Synthetic Optical Flow
- URL: http://arxiv.org/abs/2201.05723v1
- Date: Sat, 15 Jan 2022 01:10:34 GMT
- Title: Learning Temporally and Semantically Consistent Unpaired Video-to-video
Translation Through Pseudo-Supervision From Synthetic Optical Flow
- Authors: Kaihong Wang, Kumar Akash, Teruhisa Misu
- Abstract summary: Unpaired-to-video translation aims to translate videos between a source and a target domain without the need of paired training data, making it more feasible for real applications.
We propose a paradigm that regularizes video consistency by synthesizing novel motions in input videos with the generated optical flow instead of estimating them.
- Score: 5.184108122340348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unpaired video-to-video translation aims to translate videos between a source
and a target domain without the need of paired training data, making it more
feasible for real applications. Unfortunately, the translated videos generally
suffer from temporal and semantic inconsistency. To address this, many existing
works adopt spatiotemporal consistency constraints incorporating temporal
information based on motion estimation. However, the inaccuracies in the
estimation of motion deteriorate the quality of the guidance towards
spatiotemporal consistency, which leads to unstable translation. In this work,
we propose a novel paradigm that regularizes the spatiotemporal consistency by
synthesizing motions in input videos with the generated optical flow instead of
estimating them. Therefore, the synthetic motion can be applied in the
regularization paradigm to keep motions consistent across domains without the
risk of errors in motion estimation. Thereafter, we utilize our unsupervised
recycle and unsupervised spatial loss, guided by the pseudo-supervision
provided by the synthetic optical flow, to accurately enforce spatiotemporal
consistency in both domains. Experiments show that our method is versatile in
various scenarios and achieves state-of-the-art performance in generating
temporally and semantically consistent videos. Code is available at:
https://github.com/wangkaihong/Unsup_Recycle_GAN/.
Related papers
- Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss [35.69606926024434]
We propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss.
We then design a motion consistency loss to maintain similar feature correlation patterns in the generated video.
This approach improves temporal consistency across various motion control tasks while preserving the benefits of a training-free setup.
arXiv Detail & Related papers (2025-01-13T18:53:08Z) - Temporally Consistent Object-Centric Learning by Contrasting Slots [23.203973564679508]
We introduce a novel object-level temporal contrastive loss for video object-centric models.
Our method significantly improves the temporal consistency of the learned object-centric representations.
arXiv Detail & Related papers (2024-12-18T19:46:04Z) - FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint.
This enhancement ensures a more consistent transformation of semantically similar content across frames.
Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z) - Spatial Decomposition and Temporal Fusion based Inter Prediction for
Learned Video Compression [59.632286735304156]
We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression.
With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
arXiv Detail & Related papers (2024-01-29T03:30:21Z) - Segmenting the motion components of a video: A long-term unsupervised model [5.801044612920816]
We want to provide a coherent and stable motion segmentation over the video sequence.
We propose a novel long-term optical-temporal model operating in a totally unsupervised way.
We report experiments on four VOS, demonstrating competitive quantitative results.
arXiv Detail & Related papers (2023-10-02T09:33:54Z) - STint: Self-supervised Temporal Interpolation for Geospatial Data [0.0]
Supervised and unsupervised techniques have demonstrated the potential for temporal of video data.
Most prevailing temporal techniques hinge on optical flow, which encodes the motion of pixels between video frames.
In this work, we propose an unsupervised temporal technique, which does not rely on ground truth data or require any motion information like optical flow.
arXiv Detail & Related papers (2023-08-31T18:04:50Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones.
It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.