Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
- URL: http://arxiv.org/abs/2505.24227v1
- Date: Fri, 30 May 2025 05:30:02 GMT
- Title: Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
- Authors: Ying Yang, Jie Zhang, Xiao Lv, Di Lin, Tao Xiang, Qing Guo,
- Abstract summary: We propose textbfLightD, a novel framework that generates natural adversarial samples for vision-and-language pretraining models.<n>LightD expands the optimization space while ensuring perturbations align with scene semantics.
- Score: 56.84206059390887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While adversarial attacks on vision-and-language pretraining (VLP) models have been explored, generating natural adversarial samples crafted through realistic and semantically meaningful perturbations remains an open challenge. Existing methods, primarily designed for classification tasks, struggle when adapted to VLP models due to their restricted optimization spaces, leading to ineffective attacks or unnatural artifacts. To address this, we propose \textbf{LightD}, a novel framework that generates natural adversarial samples for VLP models via semantically guided relighting. Specifically, LightD leverages ChatGPT to propose context-aware initial lighting parameters and integrates a pretrained relighting model (IC-light) to enable diverse lighting adjustments. LightD expands the optimization space while ensuring perturbations align with scene semantics. Additionally, gradient-based optimization is applied to the reference lighting image to further enhance attack effectiveness while maintaining visual naturalness. The effectiveness and superiority of the proposed LightD have been demonstrated across various VLP models in tasks such as image captioning and visual question answering.
Related papers
- Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement [41.66776033752888]
Most low-light image enhancement methods rely on pre-trained model priors, low-light inputs, or both.<n>We propose VLM-IMI, a novel framework that leverages large vision-language models with iterative and manual instructions.<n>VLM-IMI incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration.
arXiv Detail & Related papers (2025-07-24T03:35:20Z) - SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement [58.79901582809091]
Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>Recent Transformer-based low-light enhancement methods have made promising progress in recovering global illumination.<n>We present a Spatially-Adaptive Illumination-Guided Transformer framework that enables accurate illumination restoration.
arXiv Detail & Related papers (2025-07-21T11:38:56Z) - From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning [65.94580484237737]
Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors.<n>We build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU)<n>To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance.
arXiv Detail & Related papers (2025-07-11T07:51:26Z) - DreamLight: Towards Harmonious and Consistent Image Relighting [41.90032795389507]
We introduce a model named DreamLight for universal image relighting.<n>It can seamlessly composite subjects into a new background while maintaining aesthetic uniformity in terms of lighting and color tone.
arXiv Detail & Related papers (2025-06-17T14:05:24Z) - TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement [30.498816319802412]
We propose a new light enhancement task and a new framework that provides customized lighting control through prompt-driven, semantic-level, and quantitative brightness adjustments.<n> Experimental results on benchmark datasets demonstrate our framework's superior performance at increasing visibility, maintaining natural color balance, and amplifying fine details without creating artifacts.
arXiv Detail & Related papers (2025-03-11T08:30:50Z) - When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack [13.197468488144038]
Vision-Language Models (VLMs) have achieved remarkable success in various tasks, yet their robustness to real-world illumination variations remains largely unexplored.<n>We propose textbfIllumination textbfTransformation textbfAttack (textbfITA), the first framework to systematically assess VLMs' robustness against illumination changes.
arXiv Detail & Related papers (2025-03-10T04:12:56Z) - D3DR: Lighting-Aware Object Insertion in Gaussian Splatting [48.80431740983095]
We propose a method, dubbed D3DR, for inserting a 3DGS-parametrized object into 3DGS scenes.<n>We leverage advances in diffusion models, which, trained on real-world data, implicitly understand correct scene lighting.<n>We demonstrate the method's effectiveness by comparing it to existing approaches.
arXiv Detail & Related papers (2025-03-09T19:48:00Z) - Low-Light Image Enhancement via Generative Perceptual Priors [75.01646333310073]
We introduce a novel textbfLLIE framework with the guidance of vision-language models (VLMs)<n>We first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors.<n>To incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (textittextbfLPP-Attn) guided by global and local perceptual priors.
arXiv Detail & Related papers (2024-12-30T12:51:52Z) - DifFRelight: Diffusion-Based Facial Performance Relighting [12.909429637057343]
We present a novel framework for free-viewpoint facial performance relighting using diffusion-based image-to-image translation.
We train a diffusion model for precise lighting control, enabling high-fidelity relit facial images from flat-lit inputs.
The model accurately reproduces complex lighting effects like eye reflections, subsurface scattering, self-shadowing, and translucency.
arXiv Detail & Related papers (2024-10-10T17:56:44Z) - Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors [38.96909959677438]
Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments.
Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources.
We devise a novel unsupervised LIE framework based on diffusion priors and lookup tables to achieve efficient low-light image recovery.
arXiv Detail & Related papers (2024-09-27T16:37:27Z) - Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast.
These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios.
We first propose a method for estimating the noise level in low light images in a quick and accurate way.
We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z) - Learning Flow-based Feature Warping for Face Frontalization with
Illumination Inconsistent Supervision [73.18554605744842]
Flow-based Feature Warping Model (FFWM) learns to synthesize photo-realistic and illumination preserving frontal images.
An Illumination Preserving Module (IPM) is proposed to learn illumination preserving image synthesis.
A Warp Attention Module (WAM) is introduced to reduce the pose discrepancy in the feature level.
arXiv Detail & Related papers (2020-08-16T06:07:00Z) - Unsupervised Low-light Image Enhancement with Decoupled Networks [103.74355338972123]
We learn a two-stage GAN-based framework to enhance the real-world low-light images in a fully unsupervised fashion.
Our proposed method outperforms the state-of-the-art unsupervised image enhancement methods in terms of both illumination enhancement and noise reduction.
arXiv Detail & Related papers (2020-05-06T13:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.