When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
- URL: http://arxiv.org/abs/2503.06903v2
- Date: Fri, 21 Mar 2025 08:37:44 GMT
- Title: When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack
- Authors: Hanqing Liu, Shouwei Ruan, Yao Huang, Shiji Zhao, Xingxing Wei,
- Abstract summary: Vision-Language Models (VLMs) have achieved remarkable success in various tasks, yet their robustness to real-world illumination variations remains largely unexplored.<n>We propose textbfIllumination textbfTransformation textbfAttack (textbfITA), the first framework to systematically assess VLMs' robustness against illumination changes.
- Score: 13.197468488144038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-Language Models (VLMs) have achieved remarkable success in various tasks, yet their robustness to real-world illumination variations remains largely unexplored. To bridge this gap, we propose \textbf{I}llumination \textbf{T}ransformation \textbf{A}ttack (\textbf{ITA}), the first framework to systematically assess VLMs' robustness against illumination changes. However, there still exist two key challenges: (1) how to model global illumination with fine-grained control to achieve diverse lighting conditions and (2) how to ensure adversarial effectiveness while maintaining naturalness. To address the first challenge, we innovatively decompose global illumination into multiple parameterized point light sources based on the illumination rendering equation. This design enables us to model more diverse lighting variations that previous methods could not capture. Then, by integrating these parameterized lighting variations with physics-based lighting reconstruction techniques, we could precisely render such light interactions in the original scenes, finally meeting the goal of fine-grained lighting control. For the second challenge, by controlling illumination through the lighting reconstrution model's latent space rather than direct pixel manipulation, we inherently preserve physical lighting priors. Furthermore, to prevent potential reconstruction artifacts, we design additional perceptual constraints for maintaining visual consistency with original images and diversity constraints for avoiding light source convergence. Extensive experiments demonstrate that our ITA could significantly reduce the performance of advanced VLMs, e.g., LLaVA-1.6, while possessing competitive naturalness, exposing VLMS' critical illuminiation vulnerabilities.
Related papers
- DifFRelight: Diffusion-Based Facial Performance Relighting [12.909429637057343]
We present a novel framework for free-viewpoint facial performance relighting using diffusion-based image-to-image translation.
We train a diffusion model for precise lighting control, enabling high-fidelity relit facial images from flat-lit inputs.
The model accurately reproduces complex lighting effects like eye reflections, subsurface scattering, self-shadowing, and translucency.
arXiv Detail & Related papers (2024-10-10T17:56:44Z) - Attentive Illumination Decomposition Model for Multi-Illuminant White
Balancing [27.950125640986805]
White balance (WB) algorithms in many commercial cameras assume single and uniform illumination.
We present a deep white balancing model that leverages the slot attention, where each slot is in charge of representing individual illuminants.
This design enables the model to generate chromaticities and weight maps for individual illuminants, which are then fused to compose the final illumination map.
arXiv Detail & Related papers (2024-02-28T12:15:29Z) - URHand: Universal Relightable Hands [64.25893653236912]
We present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities.
Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations.
arXiv Detail & Related papers (2024-01-10T18:59:51Z) - Relightable Neural Actor with Intrinsic Decomposition and Pose Control [80.06094206522668]
We propose Relightable Neural Actor, a new video-based method for learning a pose-driven neural human model that can be relighted.
For training, our method solely requires a multi-view recording of the human under a known, but static lighting condition.
To evaluate our approach in real-world scenarios, we collect a new dataset with four identities recorded under different light conditions, indoors and outdoors.
arXiv Detail & Related papers (2023-12-18T14:30:13Z) - Diving into Darkness: A Dual-Modulated Framework for High-Fidelity
Super-Resolution in Ultra-Dark Environments [51.58771256128329]
This paper proposes a specialized dual-modulated learning framework that attempts to deeply dissect the nature of the low-light super-resolution task.
We develop Illuminance-Semantic Dual Modulation (ISDM) components to enhance feature-level preservation of illumination and color details.
Comprehensive experiments showcases the applicability and generalizability of our approach to diverse and challenging ultra-low-light conditions.
arXiv Detail & Related papers (2023-09-11T06:55:32Z) - Improving Lens Flare Removal with General Purpose Pipeline and Multiple
Light Sources Recovery [69.71080926778413]
flare artifacts can affect image visual quality and downstream computer vision tasks.
Current methods do not consider automatic exposure and tone mapping in image signal processing pipeline.
We propose a solution to improve the performance of lens flare removal by revisiting the ISP and design a more reliable light sources recovery strategy.
arXiv Detail & Related papers (2023-08-31T04:58:17Z) - Physically Inspired Dense Fusion Networks for Relighting [45.66699760138863]
We propose a model which enriches neural networks with physical insight.
Our method generates the relighted image with new illumination settings via two different strategies.
We show that our proposal can outperform many state-of-the-art methods in terms of well-known fidelity metrics and perceptual loss.
arXiv Detail & Related papers (2021-05-05T17:33:45Z) - Learning Flow-based Feature Warping for Face Frontalization with
Illumination Inconsistent Supervision [73.18554605744842]
Flow-based Feature Warping Model (FFWM) learns to synthesize photo-realistic and illumination preserving frontal images.
An Illumination Preserving Module (IPM) is proposed to learn illumination preserving image synthesis.
A Warp Attention Module (WAM) is introduced to reduce the pose discrepancy in the feature level.
arXiv Detail & Related papers (2020-08-16T06:07:00Z) - Unsupervised Low-light Image Enhancement with Decoupled Networks [103.74355338972123]
We learn a two-stage GAN-based framework to enhance the real-world low-light images in a fully unsupervised fashion.
Our proposed method outperforms the state-of-the-art unsupervised image enhancement methods in terms of both illumination enhancement and noise reduction.
arXiv Detail & Related papers (2020-05-06T13:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.