MLI-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields
- URL: http://arxiv.org/abs/2411.17235v1
- Date: Tue, 26 Nov 2024 08:57:38 GMT
- Title: MLI-NeRF: Multi-Light Intrinsic-Aware Neural Radiance Fields
- Authors: Yixiong Yang, Shilin Hu, Haoyu Wu, Ramon Baldrich, Dimitris Samaras, Maria Vanrell,
- Abstract summary: Current methods for extracting intrinsic image components, such as reflectance and shading, rely on statistical priors.
We propose MLI-NeRF, which integrates textbfMultiple textbfLight information in textbfIntrinsic-aware textbfNeural textbfRadiance textbfFields.
- Score: 21.057216351934688
- License:
- Abstract: Current methods for extracting intrinsic image components, such as reflectance and shading, primarily rely on statistical priors. These methods focus mainly on simple synthetic scenes and isolated objects and struggle to perform well on challenging real-world data. To address this issue, we propose MLI-NeRF, which integrates \textbf{M}ultiple \textbf{L}ight information in \textbf{I}ntrinsic-aware \textbf{Ne}ural \textbf{R}adiance \textbf{F}ields. By leveraging scene information provided by different light source positions complementing the multi-view information, we generate pseudo-label images for reflectance and shading to guide intrinsic image decomposition without the need for ground truth data. Our method introduces straightforward supervision for intrinsic component separation and ensures robustness across diverse scene types. We validate our approach on both synthetic and real-world datasets, outperforming existing state-of-the-art methods. Additionally, we demonstrate its applicability to various image editing tasks. The code and data are publicly available.
Related papers
- Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields [21.057216351934688]
Our research introduces a method that combines relighting with intrinsic decomposition.
By leveraging light variations in scenes to generate pseudo labels, our method provides guidance for intrinsic decomposition.
We validate our method on both synthetic and real-world datasets, achieving convincing results.
arXiv Detail & Related papers (2024-06-16T21:10:37Z) - Boosting General Trimap-free Matting in the Real-World Image [0.0]
We propose a network called textbfMulti-textbfFeature fusion-based textbfCoarse-to-fine Network textbf(MFC-Net).
Our method is significantly effective on both synthetic and real-world images, and the performance in the real-world dataset is far better than existing matting-free methods.
arXiv Detail & Related papers (2024-05-28T07:37:44Z) - TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding [91.30065932213758]
Large Multimodal Models (LMMs) have sparked a surge in research aimed at harnessing their remarkable reasoning abilities.
We propose TextCoT, a novel Chain-of-Thought framework for text-rich image understanding.
Our method is free of extra training, offering immediate plug-and-play functionality.
arXiv Detail & Related papers (2024-04-15T13:54:35Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Light Field Diffusion for Single-View Novel View Synthesis [32.59286750410843]
Single-view novel view synthesis (NVS) is important but challenging in computer vision.
Recent advancements in NVS have leveraged Denoising Diffusion Probabilistic Models (DDPMs) for their exceptional ability to produce high-fidelity images.
We present Light Field Diffusion (LFD), a novel conditional diffusion-based approach that transcends the conventional reliance on camera pose matrices.
arXiv Detail & Related papers (2023-09-20T03:27:06Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Learning-based Inverse Rendering of Complex Indoor Scenes with
Differentiable Monte Carlo Raytracing [27.96634370355241]
This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling.
The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials.
arXiv Detail & Related papers (2022-11-06T03:34:26Z) - ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition [38.08486689940946]
Multi-modal Named Entity Recognition (MNER) has attracted a lot of attention.
It is difficult to model such interactions as image and text representations are trained separately on the data of their respective modality.
In this paper, we propose bf Image-bf text bf Alignments (ITA) to align image features into the textual space.
arXiv Detail & Related papers (2021-12-13T08:29:43Z) - Real-MFF: A Large Realistic Multi-focus Image Dataset with Ground Truth [58.226535803985804]
We introduce a large and realistic multi-focus dataset called Real-MFF.
The dataset contains 710 pairs of source images with corresponding ground truth images.
We evaluate 10 typical multi-focus algorithms on this dataset for the purpose of illustration.
arXiv Detail & Related papers (2020-03-28T12:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.