Related papers: TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer

URL: http://arxiv.org/abs/2506.18904v2
Date: Wed, 02 Jul 2025 12:51:03 GMT
Title: TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer
Authors: Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Zhaoxiang Zhang, Junran Peng,
Abstract summary: Illumination and texture editing are critical dimensions for world-to-world transfer.<n>Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models.<n>We propose TC-Light, a novel generative computation to overcome these problems.
Score: 47.22201704648345
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Illumination and texture editing are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly limited to the domain of training data (e.g., portrait) or fall into the bottleneck of temporal consistency and computation efficiency, especially when the input video involves complex dynamics and long durations. In this paper, we propose TC-Light, a novel generative renderer to overcome these problems. Starting from the video preliminarily relighted by an inflated video relighting model, it optimizes appearance embedding in the first stage to align global illumination. Then it optimizes the proposed canonical video representation, i.e., Unique Video Tensor (UVT), to align fine-grained texture and lighting in the second stage. To comprehensively evaluate performance, we also establish a long and highly dynamic video benchmark. Extensive experiments show that our method enables physically plausible re-rendering results with superior temporal coherence and low computation cost. The code and video demos are available at https://dekuliutesla.github.io/tclight/.

Related papers

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion [52.420894727186216]
Light-A-Video is a training-free approach to achieve temporally smooth video relighting.<n>Adapted from image relighting models, Light-A-Video introduces two key techniques to enhance lighting consistency.
arXiv Detail & Related papers (2025-02-12T17:24:19Z)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT [98.56372305225271]
Lumina-Next achieves exceptional performance in the generation of images with Next-DiT.<n> Lumina-Video incorporates a Multi-scale Next-DiT architecture, which jointly learns multiple patchifications.<n>We propose Lumina-V2A, a video-to-audio model based on Next-DiT, to create synchronized sounds for generated videos.
arXiv Detail & Related papers (2025-02-10T18:58:11Z)
DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models [83.28670336340608]
We introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering.<n>Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
arXiv Detail & Related papers (2025-01-30T18:59:11Z)
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian [78.74634059559891]
EnvGS is a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments.<n>To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based reflection that leverages the GPU's RT core for fast rendering.<n>Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections.
arXiv Detail & Related papers (2024-12-19T18:59:57Z)
GenLit: Reformulating Single-Image Relighting as Video Generation [39.06560955055697]
We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model.<n>We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes.
arXiv Detail & Related papers (2024-12-15T15:40:40Z)
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.<n>First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.<n>Second, we present MotionAura, a text-to-video generation framework.<n>Third, we propose a spectral transformer-based denoising network.<n>Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z)
BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z)
Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once. This is in contrast to existing video models which synthesize distants followed by temporal super-resolution. By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z)
Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z)
VideoLightFormer: Lightweight Action Recognition using Transformers [8.871042314510788]
We propose a novel, lightweight action recognition architecture, VideoLightFormer. In a factorized fashion, we carefully extend the 2D convolutional Temporal Network with transformers. We evaluate VideoLightFormer in a high-efficiency setting on the temporally-demanding EPIC-KITCHENS-100 and Something-SV-V-Something2 datasets.
arXiv Detail & Related papers (2021-07-01T13:55:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.