GenLit: Reformulating Single-Image Relighting as Video Generation
- URL: http://arxiv.org/abs/2412.11224v1
- Date: Sun, 15 Dec 2024 15:40:40 GMT
- Title: GenLit: Reformulating Single-Image Relighting as Video Generation
- Authors: Shrisha Bharadwaj, Haiwen Feng, Victoria Abrevaya, Michael J. Black,
- Abstract summary: We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video generation model.
We find that a model fine-tuned on only a small synthetic dataset is able to generalize to real images.
- Score: 44.409962561291216
- License:
- Abstract: Manipulating the illumination within a single image represents a fundamental challenge in computer vision and graphics. This problem has been traditionally addressed using inverse rendering techniques, which require explicit 3D asset reconstruction and costly ray tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be practical and possible -- one that replaces explicit physical models with networks that are trained on massive amounts of image and video data. In this paper, we explore the potential of exploiting video diffusion models, and in particular Stable Video Diffusion (SVD), in understanding the physical world to perform relighting tasks given a single image. Specifically, we introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate the results directly as a video sequence. We find that a model fine-tuned on only a small synthetic dataset (270 objects) is able to generalize to real images, enabling single-image relighting with realistic ray tracing effects and cast shadows. These results reveal the ability of video foundation models to capture rich information about lighting, material, and shape. Our findings suggest that such models, with minimal training, can be used for physically-based rendering without explicit physically asset reconstruction and complex ray tracing. This further suggests the potential of such models for controllable and physically accurate image synthesis tasks.
Related papers
- DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models [83.28670336340608]
We introduce DiffusionRenderer, a neural approach that addresses the dual problem of inverse and forward rendering.
Our model enables practical applications from a single video input--including relighting, material editing, and realistic object insertion.
arXiv Detail & Related papers (2025-01-30T18:59:11Z) - 3D Object Manipulation in a Single Image using Generative Models [30.241857090353864]
We introduce textbfOMG3D, a novel framework that integrates the precise geometric control with the generative power of diffusion models.
Our framework first converts 2D objects into 3D, enabling user-directed modifications and lifelike motions at the geometric level.
Remarkably, all these steps can be done using one NVIDIA 3090.
arXiv Detail & Related papers (2025-01-22T15:06:30Z) - Relightable Neural Actor with Intrinsic Decomposition and Pose Control [80.06094206522668]
We propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted.
For training, our method solely requires a multi-view recording of the human under a known, but static lighting condition.
To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors.
arXiv Detail & Related papers (2023-12-18T14:30:13Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - Texture Generation Using Graph Generative Adversarial Network And
Differentiable Rendering [0.6439285904756329]
Novel texture synthesis for existing 3D mesh models is an important step towards photo realistic asset generation for simulators.
Existing methods inherently work in the 2D image space which is the projection of the 3D space from a given camera perspective.
We present a new system called a graph generative adversarial network (GGAN) that can generate textures which can be directly integrated into a given 3D mesh models with tools like Blender and Unreal Engine.
arXiv Detail & Related papers (2022-06-17T04:56:03Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - Neural Reflectance Fields for Appearance Acquisition [61.542001266380375]
We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene.
We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light.
arXiv Detail & Related papers (2020-08-09T22:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.