Does FLUX Already Know How to Perform Physically Plausible Image Composition?
- URL: http://arxiv.org/abs/2509.21278v3
- Date: Sun, 02 Nov 2025 12:29:52 GMT
- Title: Does FLUX Already Know How to Perform Physically Plausible Image Composition?
- Authors: Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, Adams Wai-Kin Kong,
- Abstract summary: SHINE is a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors.<n>We introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces.
- Score: 26.848563827256914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image composition aims to seamlessly insert a user-specified object into a new scene, but existing models struggle with complex lighting (e.g., accurate shadows, water reflections) and diverse, high-resolution inputs. Modern text-to-image diffusion models (e.g., SD3.5, FLUX) already encode essential physical and resolution priors, yet lack a framework to unleash them without resorting to latent inversion, which often locks object poses into contextually inappropriate orientations, or brittle attention surgery. We propose SHINE, a training-free framework for Seamless, High-fidelity Insertion with Neutralized Errors. SHINE introduces manifold-steered anchor loss, leveraging pretrained customization adapters (e.g., IP-Adapter) to guide latents for faithful subject representation while preserving background integrity. Degradation-suppression guidance and adaptive background blending are proposed to further eliminate low-quality outputs and visible seams. To address the lack of rigorous benchmarks, we introduce ComplexCompo, featuring diverse resolutions and challenging conditions such as low lighting, strong illumination, intricate shadows, and reflective surfaces. Experiments on ComplexCompo and DreamEditBench show state-of-the-art performance on standard metrics (e.g., DINOv2) and human-aligned scores (e.g., DreamSim, ImageReward, VisionReward). Code and benchmark will be publicly available upon publication.
Related papers
- Rectifying Latent Space for Generative Single-Image Reflection Removal [16.341477336909765]
Single-image removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions.<n>This work reframes an editing-purpose latent diffusion model to effectively perceive and process highly ambiguous, layered image inputs.
arXiv Detail & Related papers (2025-12-06T09:16:14Z) - MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference [83.38607296779423]
We show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting.<n>Our method faithfully recovers both illumination and geometry, achieving state-of-the-art rendering quality in novel views synthesis.
arXiv Detail & Related papers (2025-10-13T13:29:20Z) - SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement [58.79901582809091]
Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>We present a Spatially-Adaptive Illumination-Guided Transformer framework that enables accurate illumination restoration.
arXiv Detail & Related papers (2025-07-21T11:38:56Z) - MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation [19.46962637673285]
MV-CoLight is a framework for illumination-consistent object compositing in 2D and 3D scenes.<n>We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly.<n> Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset.
arXiv Detail & Related papers (2025-05-27T17:53:02Z) - CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement [97.95330185793358]
Low-light image enhancement (LLIE) aims to improve low-illumination images.<n>Existing methods face two challenges: uncertainty in restoration from diverse brightness degradations and loss of texture and color information.<n>We propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement.
arXiv Detail & Related papers (2024-04-08T07:34:39Z) - Towards Image Ambient Lighting Normalization [47.42834070783831]
Ambient Lighting Normalization (ALN) enables the study of interactions between shadows, unifying image restoration and shadow removal in a broader context.
For benchmarking, we select various mainstream methods and rigorously evaluate them on Ambient6K.
Experiments show that IFBlend achieves SOTA scores on Ambient6K and exhibits competitive performance on conventional shadow removal benchmarks.
arXiv Detail & Related papers (2024-03-27T16:20:55Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast.
These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios.
We first propose a method for estimating the noise level in low light images in a quick and accurate way.
We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z) - SCRNet: a Retinex Structure-based Low-light Enhancement Model Guided by
Spatial Consistency [22.54951703413469]
We present a novel low-light image enhancement model, termed Spatial Consistency Retinex Network (SCRNet)
Our proposed model incorporates three levels of consistency: channel level, semantic level, and texture level, inspired by the principle of spatial consistency.
Extensive evaluations on various low-light image datasets demonstrate that our proposed SCRNet outshines existing state-of-the-art methods.
arXiv Detail & Related papers (2023-05-14T03:32:19Z) - DiFaReli++: Diffusion Face Relighting with Consistent Cast Shadows [11.566896201650056]
We introduce a novel approach to single-view face relighting in the wild, addressing challenges such as global illumination and cast shadows.<n>We propose a single-shot relighting framework that requires just one network pass, given pre-processed data, and even outperforms the teacher model across all metrics.
arXiv Detail & Related papers (2023-04-19T08:03:20Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.