Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
- URL: http://arxiv.org/abs/2501.19143v1
- Date: Fri, 31 Jan 2025 13:57:34 GMT
- Title: Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
- Authors: Ching-Chun Chang, Fan-Yun Chen, Shih-Hong Gu, Kai Gao, Hanrui Wang, Isao Echizen,
- Abstract summary: We propose a disillusion paradigm based on the concept of an imitation game.
At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning.
- Score: 14.195175901422308
- License:
- Abstract: As the cornerstone of artificial intelligence, machine perception confronts a fundamental threat posed by adversarial illusions. These adversarial attacks manifest in two primary forms: deductive illusion, where specific stimuli are crafted based on the victim model's general decision logic, and inductive illusion, where the victim model's general decision logic is shaped by specific stimuli. The former exploits the model's decision boundaries to create a stimulus that, when applied, interferes with its decision-making process. The latter reinforces a conditioned reflex in the model, embedding a backdoor during its learning phase that, when triggered by a stimulus, causes aberrant behaviours. The multifaceted nature of adversarial illusions calls for a unified defence framework, addressing vulnerabilities across various forms of attack. In this study, we propose a disillusion paradigm based on the concept of an imitation game. At the heart of the imitation game lies a multimodal generative agent, steered by chain-of-thought reasoning, which observes, internalises and reconstructs the semantic essence of a sample, liberated from the classic pursuit of reversing the sample to its original state. As a proof of concept, we conduct experimental simulations using a multimodal generative dialogue agent and evaluates the methodology under a variety of attack scenarios.
Related papers
- Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions [51.51850981481236]
We introduce POATE, a novel jailbreak technique that harnesses contrastive reasoning to provoke unethical responses.
PoATE crafts semantically opposing intents and integrates them with adversarial templates, steering models toward harmful outputs with remarkable subtlety.
To counter this, we propose Intent-Aware CoT and Reverse Thinking CoT, which decompose queries to detect malicious intent and reason in reverse to evaluate and reject harmful responses.
arXiv Detail & Related papers (2025-01-03T15:40:03Z) - Steganography in Game Actions [8.095373104009868]
This study seeks to extend the boundaries of what is considered a viable steganographic medium.
We explore a steganographic paradigm, where hidden information is communicated through the episodes of multiple agents interacting with an environment.
As a proof of concept, we exemplify action steganography through the game of labyrinth, a navigation task where subliminal communication is concealed within the act of steering toward a destination.
arXiv Detail & Related papers (2024-12-11T12:02:36Z) - BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor.
BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z) - Rethinking harmless refusals when fine-tuning foundation models [0.8571111167616167]
We investigate the degree to which fine-tuning in Large Language Models (LLMs) effectively mitigates versus merely conceals undesirable behavior.
We identify a pervasive phenomenon we term emphreason-based deception, where models either stop producing reasoning traces or produce seemingly ethical reasoning traces that belie the unethical nature of their final outputs.
arXiv Detail & Related papers (2024-06-27T22:08:22Z) - What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-modal Models [50.97705264224828]
We propose Counterfactual Inception, a novel method that implants counterfactual thinking into Large Multi-modal Models.
We aim for the models to engage with and generate responses that span a wider contextual scene understanding.
Comprehensive analyses across various LMMs, including both open-source and proprietary models, corroborate that counterfactual thinking significantly reduces hallucination.
arXiv Detail & Related papers (2024-03-20T11:27:20Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Play with Emotion: Affect-Driven Reinforcement Learning [3.611888922173257]
This paper introduces a paradigm shift by viewing the task of affect modeling as a reinforcement learning process.
We test our hypotheses in a racing game by training Go-Blend agents to model human demonstrations of arousal and behavior.
arXiv Detail & Related papers (2022-08-26T12:28:24Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Failures of Contingent Thinking [2.055949720959582]
We show that a wide range of behavior observed in experimental settings manifest as failures to perceive implications.
We show that an agent's account of implication identifies a subjective state-space that underlies her behavior.
arXiv Detail & Related papers (2020-07-15T14:21:16Z) - Agent-Based Simulation of Collective Cooperation: From Experiment to
Model [0.0]
We present an experiment to observe what happens when humans pass through a dense static crowd.
We derive a model that incorporates agents' perception and cognitive processing of a situation that needs cooperation.
Agents' ability to successfully get through a dense crowd emerges as an effect of the psychological model.
arXiv Detail & Related papers (2020-05-26T13:29:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.