GenLit: Reformulating Single-Image Relighting as Video Generation
- URL: http://arxiv.org/abs/2412.11224v3
- Date: Fri, 20 Jun 2025 10:10:55 GMT
- Title: GenLit: Reformulating Single-Image Relighting as Video Generation
- Authors: Shrisha Bharadwaj, Haiwen Feng, Giorgio Becherini, Victoria Fernandez Abrevaya, Michael J. Black,
- Abstract summary: We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model.<n>We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes.
- Score: 39.06560955055697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manipulating the illumination of a 3D scene within a single image represents a fundamental challenge in computer vision and graphics. This problem has traditionally been addressed using inverse rendering techniques, which involve explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the physical world understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image. We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image, and generate results directly as a video sequence. We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and, shape and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or complex ray tracing. Project page: https://genlit.is.tue.mpg.de/.
Related papers
- DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models [83.28670336340608]
We introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering.
Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
arXiv Detail & Related papers (2025-01-30T18:59:11Z) - 3D Object Manipulation in a Single Image using Generative Models [30.241857090353864]
We introduce textbfOMG3D, a novel framework that integrates the precise geometric control with the generative power of diffusion models.
Our framework first converts 2D objects into 3D, enabling user-directed modifications and lifelike motions at the geometric level.
Remarkably, all these steps can be done using one NVIDIA 3090.
arXiv Detail & Related papers (2025-01-22T15:06:30Z) - 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation.
Our method begins by generating a reference video using the video generation model.
We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - Relightable Neural Actor with Intrinsic Decomposition and Pose Control [80.06094206522668]
We propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted.
For training, our method solely requires a multi-view recording of the human under a known, but static lighting condition.
To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors.
arXiv Detail & Related papers (2023-12-18T14:30:13Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - Texture Generation Using Graph Generative Adversarial Network And
Differentiable Rendering [0.6439285904756329]
Novel texture synthesis for existing 3D mesh models is an important step towards photo realistic asset generation for simulators.
Existing methods inherently work in the 2D image space which is the projection of the 3D space from a given camera perspective.
We present a new system called a graph generative adversarial network (GGAN) that can generate textures which can be directly integrated into a given 3D mesh models with tools like Blender and Unreal Engine.
arXiv Detail & Related papers (2022-06-17T04:56:03Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - Neural Reflectance Fields for Appearance Acquisition [61.542001266380375]
We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene.
We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light.
arXiv Detail & Related papers (2020-08-09T22:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.