Related papers: CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction

CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction

URL: http://arxiv.org/abs/2507.15748v3
Date: Tue, 30 Sep 2025 15:11:14 GMT
Title: CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
Authors: Jisu Shin, Richard Shaw, Seunghyun Shin, Zhensong Zhang, Hae-Gon Jeon, Eduardo Perez-Pellitero,
Abstract summary: Camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction.<n>Appearance variations violate multi-view consistency and degrade novel view synthesis.<n>We propose a generalizable, feed-forward approach that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner.
Score: 30.088316989385106
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade novel view synthesis. Joint optimization of scene-specific representations and per-image appearance embeddings has been proposed to address this issue, but with increased computational complexity and slower training. In this work, we propose a generalizable, feed-forward approach that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner. Our model processes hundreds of frames in a single step, enabling efficient large-scale harmonization, and seamlessly integrates into downstream 3D reconstruction models, providing cross-scene generalization without requiring scene-specific retraining. To overcome the lack of paired data, we employ a hybrid self-supervised rendering loss leveraging 3D foundation models, improving generalization to real-world variations. Extensive experiments show that our approach outperforms or matches the reconstruction quality of existing scene-specific optimization methods with appearance modeling, without significantly affecting the training time of baseline 3D models.

Related papers

Unifying Color and Lightness Correction with View-Adaptive Curve Adjustment for Robust 3D Novel View Synthesis [73.27997579020233]
We propose Luminance-GS++, a 3DGS-based framework for robust NVS under diverse illumination conditions.<n>Our method combines a globally view-adaptive lightness adjustment with a local pixel-wise residual refinement for precise color correction.
arXiv Detail & Related papers (2026-02-20T16:20:50Z)
Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers [55.15722080205737]
Edit2Perceive is a unified diffusion framework that adapts editing models for depth, normal, and matting.<n>Our single-step deterministic inference yields up to faster runtime while training on relatively small datasets.
arXiv Detail & Related papers (2025-11-24T01:13:51Z)
RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions [67.48495052903534]
We propose a general and efficient multi-view feature enhancement module, RobustGS.<n>It substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions.<n>The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner.
arXiv Detail & Related papers (2025-08-05T04:50:29Z)
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors [13.883695200241524]
RI3D is a novel approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images.<n>Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions.<n>We produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes.
arXiv Detail & Related papers (2025-03-13T20:16:58Z)
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization [27.509109317973817]
3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed.<n>Previous methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks.<n>We propose CarGS, a unified model leveraging-adaptive regularization to achieve simultaneous, high-quality surface reconstruction.
arXiv Detail & Related papers (2025-03-02T12:51:38Z)
ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [5.55656676725821]
We present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them.<n>Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape.<n>In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction.
arXiv Detail & Related papers (2025-02-13T12:49:25Z)
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement [23.707586182294932]
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. We introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image.
arXiv Detail & Related papers (2024-08-26T12:10:52Z)
WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections [8.261637198675151]
Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. We propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.
arXiv Detail & Related papers (2024-06-04T15:17:37Z)
Bilateral Guided Radiance Field Processing [4.816861458037213]
Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis. Image signal processing (ISP) in modern cameras will independently enhance them, leading to "floaters" in the reconstructed radiance fields. We propose to disentangle the enhancement by ISP at the NeRF training stage and re-apply user-desired enhancements to the reconstructed radiance fields. We demonstrate our approach can boost the visual quality of novel view synthesis by effectively removing floaters and performing enhancements from user retouching.
arXiv Detail & Related papers (2024-06-01T14:10:45Z)
MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo [54.00987996368157]
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) MVSGaussian achieves real-time rendering with better synthesis quality for each scene.
arXiv Detail & Related papers (2024-05-20T17:59:30Z)
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images [33.70056950818641]
We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. We show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data.
arXiv Detail & Related papers (2023-08-21T20:07:02Z)
Progressive Multi-view Human Mesh Recovery with Self-Supervision [68.60019434498703]
Existing solutions typically suffer from poor generalization performance to new settings. We propose a novel simulation-based training pipeline for multi-view human mesh recovery.
arXiv Detail & Related papers (2022-12-10T06:28:29Z)
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow [22.161967080759993]
Self-supervised pre-training methods have not yet delivered on dense geometric vision tasks such as stereo matching or optical flow. We build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene. We show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques.
arXiv Detail & Related papers (2022-11-18T18:18:53Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
Differentiable Rendering with Perturbed Optimizers [85.66675707599782]
Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Our work highlights the link between some well-known differentiable formulations and randomly smoothed renderings. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction.
arXiv Detail & Related papers (2021-10-18T08:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.