Related papers: MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation

MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation

URL: http://arxiv.org/abs/2505.21483v1
Date: Tue, 27 May 2025 17:53:02 GMT
Title: MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation
Authors: Kerui Ren, Jiayang Bai, Linning Xu, Lihan Jiang, Jiangmiao Pang, Mulin Yu, Bo Dai,
Abstract summary: MV-CoLight is a framework for illumination-consistent object compositing in 2D and 3D scenes.<n>We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly.<n> Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset.
Score: 19.46962637673285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object compositing offers significant promise for augmented reality (AR) and embodied intelligence applications. Existing approaches predominantly focus on single-image scenarios or intrinsic decomposition techniques, facing challenges with multi-view consistency, complex scenes, and diverse lighting conditions. Recent inverse rendering advancements, such as 3D Gaussian and diffusion-based methods, have enhanced consistency but are limited by scalability, heavy data requirements, or prolonged reconstruction time per scene. To broaden its applicability, we introduce MV-CoLight, a two-stage framework for illumination-consistent object compositing in both 2D images and 3D scenes. Our novel feed-forward architecture models lighting and shadows directly, avoiding the iterative biases of diffusion-based methods. We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly. To facilitate training and evaluation, we further introduce a large-scale 3D compositing dataset. Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset, as well as casually captured real-world scenes demonstrate the framework's robustness and wide generalization.

Related papers

Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis [49.67420486373202]
GRGS is a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions.<n>We introduce a Lighting-aware Geometry Refinement (LGR) module trained on synthetically relit data to predict accurate depth and surface normals.
arXiv Detail & Related papers (2025-05-27T17:59:47Z)
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations [42.69229582451846]
This work takes on the challenge of reconstructing 3D scenes from sparse or single-view inputs.<n>We introduce SpatialCrafter, a framework that leverages the rich knowledge in video diffusion models to generate plausible additional observations.<n>Through a trainable camera encoder and an epipolar attention mechanism for explicit geometric constraints, we achieve precise camera control and 3D consistency.
arXiv Detail & Related papers (2025-05-17T13:05:13Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.<n>Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.<n>We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting [47.67153284714988]
We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image.<n>We also propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis.<n> Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes.
arXiv Detail & Related papers (2024-12-05T03:20:35Z)
BIFRÖST: 3D-Aware Image compositing with Language Instructions [27.484947109237964]
Bifr"ost is a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Bifr"ost addresses issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process.
arXiv Detail & Related papers (2024-10-24T18:35:12Z)
MetaGS: A Meta-Learned Gaussian-Phong Model for Out-of-Distribution 3D Scene Relighting [63.5925701087252]
Out-of-distribution (OOD) 3D relighting requires novel view synthesis under unseen lighting conditions.<n>We introduce MetaGS to tackle this challenge from two perspectives.
arXiv Detail & Related papers (2024-05-31T13:48:54Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
Light Field Diffusion for Single-View Novel View Synthesis [32.59286750410843]
Single-view novel view synthesis (NVS) is important but challenging in computer vision. Recent advancements in NVS have leveraged Denoising Diffusion Probabilistic Models (DDPMs) for their exceptional ability to produce high-fidelity images. We present Light Field Diffusion (LFD), a novel conditional diffusion-based approach that transcends the conventional reliance on camera pose matrices.
arXiv Detail & Related papers (2023-09-20T03:27:06Z)
IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z)
Spatiotemporally Consistent HDR Indoor Lighting Estimation [66.26786775252592]
We propose a physically-motivated deep learning framework to solve the indoor lighting estimation problem. Given a single LDR image with a depth map, our method predicts spatially consistent lighting at any given image position. Our framework achieves photorealistic lighting prediction with higher quality compared to state-of-the-art single-image or video-based methods.
arXiv Detail & Related papers (2023-05-07T20:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.