SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model
- URL: http://arxiv.org/abs/2601.07209v1
- Date: Mon, 12 Jan 2026 05:03:12 GMT
- Title: SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model
- Authors: Yu Guo, Zhiqiang Lao, Xiyun Song, Yubin Zhou, Heather Yu,
- Abstract summary: Existing datasets suffer from limited physical realism in synthetic data or insufficient scale in real captures.<n>We introduce a synthetic dataset generation framework that path-traces 3D glass models over real background imagery to create physically accurate reflection scenarios.
- Score: 9.069411665770266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical realism in synthetic data or insufficient scale in real captures. We introduce a synthetic dataset generation framework that path-traces 3D glass models over real background imagery to create physically accurate reflection scenarios with varied glass properties, camera settings, and post-processing effects. To leverage the capabilities of Large Multimodal Model (LMM), we concatenate the image layers into a single composite input, apply joint captioning, and fine-tune the model using task-specific LoRA rather than full-parameter training. This enables our approach to achieve improved reflection removal and separation performance compared to state-of-the-art methods.
Related papers
- Reflection Removal through Efficient Adaptation of Diffusion Transformers [30.68558779968187]
We introduce a diffusion-transformer (DiT) framework for single-image reflection removal.<n>We analyze existing reflection removal data sources for diversity, scalability, and photorealism.<n>We construct a physically based rendering pipeline in Blender to synthesize realistic glass materials and reflection effects.
arXiv Detail & Related papers (2025-12-04T17:12:39Z) - Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections [55.248092751290834]
Mirror reflections are common in everyday environments and can provide stereo information within a single capture.<n>We exploit this property by treating the reflection as an auxiliary view and designing a transformation that constructs a physically valid virtual camera.<n>This enables a multi-view stereo setup from a single image, simplifying the imaging process.
arXiv Detail & Related papers (2025-09-24T23:00:22Z) - Utilizing Multi-step Loss for Single Image Reflection Removal [0.9208007322096532]
Distorted images can negatively impact tasks like object detection and image segmentation.<n>We present a novel approach for image reflection removal using a single image.
arXiv Detail & Related papers (2024-12-11T17:57:25Z) - Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding [67.57487747508179]
Multiple-in-one image restoration (IR) has made significant progress, aiming to handle all types of single degraded image restoration with a single model.
In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.
arXiv Detail & Related papers (2024-11-25T09:26:34Z) - Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections [26.02117310176884]
We tackle the problem of generating highly realistic and plausible mirror reflections using diffusion-based generative models.<n>We propose a novel depth-conditioned inpainting method called MirrorFusion, which generates high-quality, realistic, shape and appearance-aware reflections of real-world objects.<n>MirrorFusion outperforms state-of-the-art methods on SynMirror, as demonstrated by extensive quantitative and qualitative analysis.
arXiv Detail & Related papers (2024-09-23T02:59:07Z) - Relightify: Relightable 3D Faces from a Single Image via Diffusion
Models [86.3927548091627]
We present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image.
In contrast to existing methods, we directly acquire the observed texture from the input image, thus, resulting in more faithful and consistent estimation.
arXiv Detail & Related papers (2023-05-10T11:57:49Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - A Categorized Reflection Removal Dataset with Diverse Real-world Scenes [54.662456878340215]
We construct a new reflection removal dataset that is categorized, diverse, and real-world (CDR)
The dataset is constructed using diverse glass types under various environments to ensure diversity.
We show that state-of-the-art reflection removal methods generally perform well on blurry reflection but fail in obtaining satisfying performance on other types of real-world reflection.
arXiv Detail & Related papers (2021-08-07T06:56:57Z) - ReflectNet -- A Generative Adversarial Method for Single Image
Reflection Suppression [0.6980076213134382]
We propose a single image reflection removal method based on context understanding modules and adversarial training.
Our proposed reflection removal method outperforms state-of-the-art methods in terms of PSNR and SSIM on the SIR benchmark dataset.
arXiv Detail & Related papers (2021-05-11T17:33:40Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.