Related papers: Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow

Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow

URL: http://arxiv.org/abs/2201.05723v1
Date: Sat, 15 Jan 2022 01:10:34 GMT
Title: Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow
Authors: Kaihong Wang, Kumar Akash, Teruhisa Misu
Abstract summary: Unpaired-to-video translation aims to translate videos between a source and a target domain without the need of paired training data, making it more feasible for real applications. We propose a paradigm that regularizes video consistency by synthesizing novel motions in input videos with the generated optical flow instead of estimating them.
Score: 5.184108122340348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unpaired video-to-video translation aims to translate videos between a source and a target domain without the need of paired training data, making it more feasible for real applications. Unfortunately, the translated videos generally suffer from temporal and semantic inconsistency. To address this, many existing works adopt spatiotemporal consistency constraints incorporating temporal information based on motion estimation. However, the inaccuracies in the estimation of motion deteriorate the quality of the guidance towards spatiotemporal consistency, which leads to unstable translation. In this work, we propose a novel paradigm that regularizes the spatiotemporal consistency by synthesizing motions in input videos with the generated optical flow instead of estimating them. Therefore, the synthetic motion can be applied in the regularization paradigm to keep motions consistent across domains without the risk of errors in motion estimation. Thereafter, we utilize our unsupervised recycle and unsupervised spatial loss, guided by the pseudo-supervision provided by the synthetic optical flow, to accurately enforce spatiotemporal consistency in both domains. Experiments show that our method is versatile in various scenarios and achieves state-of-the-art performance in generating temporally and semantically consistent videos. Code is available at: https://github.com/wangkaihong/Unsup_Recycle_GAN/.

Related papers

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation [51.110607281391154]
FlowMo is a training-free guidance method for enhancing motion coherence in text-to-video models.<n>It estimates motion coherence by measuring the patch-wise variance across the temporal dimension and guides the model to reduce this variance dynamically during sampling.
arXiv Detail & Related papers (2025-06-01T19:55:33Z)
FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models [9.469635938429647]
Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. We propose FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
arXiv Detail & Related papers (2025-04-20T08:22:29Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones. Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency. We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss [35.69606926024434]
We propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss. We then design a motion consistency loss to maintain similar feature correlation patterns in the generated video. This approach improves temporal consistency across various motion control tasks while preserving the benefits of a training-free setup.
arXiv Detail & Related papers (2025-01-13T18:53:08Z)
Temporally Consistent Object-Centric Learning by Contrasting Slots [23.203973564679508]
We introduce a novel object-level temporal contrastive loss for video object-centric models. Our method significantly improves the temporal consistency of the learned object-centric representations.
arXiv Detail & Related papers (2024-12-18T19:46:04Z)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation [85.29772293776395]
We introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video.
arXiv Detail & Related papers (2024-03-19T17:59:18Z)
Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression [59.632286735304156]
We propose a spatial decomposition and temporal fusion based inter prediction for learned video compression. With the SDD-based motion model and long short-term temporal fusion, our proposed learned video can obtain more accurate inter prediction contexts.
arXiv Detail & Related papers (2024-01-29T03:30:21Z)
Segmenting the motion components of a video: A long-term unsupervised model [5.801044612920816]
We want to provide a coherent and stable motion segmentation over the video sequence. We propose a novel long-term optical-temporal model operating in a totally unsupervised way. We report experiments on four VOS, demonstrating competitive quantitative results.
arXiv Detail & Related papers (2023-10-02T09:33:54Z)
STint: Self-supervised Temporal Interpolation for Geospatial Data [0.0]
Supervised and unsupervised techniques have demonstrated the potential for temporal of video data. Most prevailing temporal techniques hinge on optical flow, which encodes the motion of pixels between video frames. In this work, we propose an unsupervised temporal technique, which does not rely on ground truth data or require any motion information like optical flow.
arXiv Detail & Related papers (2023-08-31T18:04:50Z)
Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling [7.111443975103329]
In this paper, we explore the optical flow estimation from multiple-frame sequences of dynamic scenes. We use motion priors of the adjacent frames to provide more reliable supervision of the occluded regions. Experiments on KITTI 2012, KITTI 2015, Sintel Clean, and Sintel Final datasets demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2023-04-14T14:32:02Z)
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis. We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework. We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z)
CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones. It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z)
Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene. We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z)
Long-Term Temporally Consistent Unpaired Video Translation from Simulated Surgical 3D Data [0.059110875077162096]
We propose a novel approach which combines unpaired image translation with neural rendering to transfer simulated to photorealistic surgical abdominal scenes. By introducing global learnable textures and a lighting-invariant view-consistency loss, our method produces consistent translations of arbitrary views. By extending existing image-based methods to view-consistent videos, we aim to impact the applicability of simulated training and evaluation environments for surgical applications.
arXiv Detail & Related papers (2021-03-31T16:31:26Z)
Intrinsic Temporal Regularization for High-resolution Human Video Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain. We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation. We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.