AUV-Fusion: Cross-Modal Adversarial Fusion of User Interactions and Visual Perturbations Against VARS
- URL: http://arxiv.org/abs/2507.22880v1
- Date: Wed, 30 Jul 2025 17:55:09 GMT
- Title: AUV-Fusion: Cross-Modal Adversarial Fusion of User Interactions and Visual Perturbations Against VARS
- Authors: Hai Ling, Tianchi Wang, Xiaohao Liu, Zhulin Tao, Lifang Yang, Xianglin Huang,
- Abstract summary: AUV-Fusion is a cross-modal adversarial attack framework that adopts high-order user preference modeling.<n>We show that AUV-Fusion significantly enhances the exposure of target (cold-start) items compared to conventional baseline methods.
- Score: 3.68186360493378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern Visual-Aware Recommender Systems (VARS) exploit the integration of user interaction data and visual features to deliver personalized recommendations with high precision. However, their robustness against adversarial attacks remains largely underexplored, posing significant risks to system reliability and security. Existing attack strategies suffer from notable limitations: shilling attacks are costly and detectable, and visual-only perturbations often fail to align with user preferences. To address these challenges, we propose AUV-Fusion, a cross-modal adversarial attack framework that adopts high-order user preference modeling and cross-modal adversary generation. Specifically, we obtain robust user embeddings through multi-hop user-item interactions and transform them via an MLP into semantically aligned perturbations. These perturbations are injected onto the latent space of a pre-trained VAE within the diffusion model. By synergistically integrating genuine user interaction data with visually plausible perturbations, AUV-Fusion eliminates the need for injecting fake user profiles and effectively mitigates the challenge of insufficient user preference extraction inherent in traditional visual-only attacks. Comprehensive evaluations on diverse VARS architectures and real-world datasets demonstrate that AUV-Fusion significantly enhances the exposure of target (cold-start) items compared to conventional baseline methods. Moreover, AUV-Fusion maintains exceptional stealth under rigorous scrutiny.
Related papers
- MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems [31.53306157650065]
Multimodal retrieval-augmented generation (RAG) systems enhance large vision-language models by integrating cross-modal knowledge.<n>These knowledge databases may contain sensitive information that requires privacy protection.<n>MrM is the first black-box MIA framework targeted at multimodal RAG systems.
arXiv Detail & Related papers (2025-06-09T03:48:50Z) - TRAP: Targeted Redirecting of Agentic Preferences [3.6293956720749425]
We introduce TRAP, a generative adversarial framework that manipulates the agent's decision-making using diffusion-based semantic injections.<n>Our method combines negative prompt-based degradation with positive semantic optimization, guided by a Siamese semantic network and layout-aware spatial masking.<n>TRAP achieves a 100% attack success rate on leading models, including LLaVA-34B, Gemma3, and Mistral-3.1.
arXiv Detail & Related papers (2025-05-29T14:57:16Z) - Defending the Edge: Representative-Attention for Mitigating Backdoor Attacks in Federated Learning [7.808916974942399]
heterogeneous edge devices produce diverse, non-independent, and identically distributed (non-IID) data.<n>We propose a novel representative-attention-based defense mechanism, named FeRA, to distinguish benign from malicious clients.<n>Our evaluation demonstrates FeRA's robustness across various FL scenarios, including challenging non-IID data distributions typical of edge devices.
arXiv Detail & Related papers (2025-05-15T13:44:32Z) - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning [97.49610356913874]
We propose a robust test-time prompt tuning (R-TPT) for vision-language models (VLMs)<n>R-TPT mitigates the impact of adversarial attacks during the inference stage.<n>We introduce a plug-and-play reliability-based weighted ensembling strategy to strengthen the defense.
arXiv Detail & Related papers (2025-04-15T13:49:31Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Disentangled Contrastive Collaborative Filtering [36.400303346450514]
Graph contrastive learning (GCL) has exhibited powerful performance in addressing the supervision label shortage issue.
We propose a Disentangled Contrastive Collaborative Filtering framework (DCCF) to realize intent disentanglement with self-supervised augmentation.
Our DCCF is able to not only distill finer-grained latent factors from the entangled self-supervision signals but also alleviate the augmentation-induced noise.
arXiv Detail & Related papers (2023-05-04T11:53:38Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Contextual Fusion For Adversarial Robustness [0.0]
Deep neural networks are usually designed to process one particular information stream and susceptible to various types of adversarial perturbations.
We developed a fusion model using a combination of background and foreground features extracted in parallel from Places-CNN and Imagenet-CNN.
For gradient based attacks, our results show that fusion allows for significant improvements in classification without decreasing performance on unperturbed data.
arXiv Detail & Related papers (2020-11-18T20:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.