Related papers: Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks

URL: http://arxiv.org/abs/2402.10527v2
Date: Mon, 16 Sep 2024 17:44:17 GMT
Title: Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks
Authors: R. Patrick Xian, Alex J. Lee, Satvik Lolla, Vincent Wang, Qiming Cui, Russell Ro, Reza Abbasi-Asl,
Abstract summary: An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs. We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
Score: 0.6282171844772422
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications. Understanding model vulnerabilities in high-stakes and knowledge-intensive tasks is essential for quantifying the trustworthiness of model predictions and regulating their use. The recent discovery of named entities as adversarial examples (i.e. adversarial entities) in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs in high-stakes and specialized domains. We examined the use of type-consistent entity substitution as a template for collecting adversarial entities for billion-parameter LLMs with biomedical knowledge. To this end, we developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge with a low query budget and controllable coverage. Our method has favorable query efficiency and scaling over alternative approaches based on random sampling and blackbox gradient-guided search, which we demonstrated for adversarial distractor generation in biomedical question answering. Subsequent failure mode analysis uncovered two regimes of adversarial entities on the attack surface with distinct characteristics and we showed that entity substitution attacks can manipulate token-wise Shapley value explanations, which become deceptive in this setting. Our approach complements standard evaluations for high-capacity models and the results highlight the brittleness of domain knowledge in LLMs.

Related papers

Towards Robust LLMs: an Adversarial Robustness Measurement Framework [0.0]
Large Language Models (LLMs) remain vulnerable to adversarial perturbations, undermining their reliability in high-stakes applications. We adapt the Robustness Measurement and Assessment framework to quantify LLM resilience against adversarial inputs without requiring access to model parameters. Our work provides a systematic methodology to assess LLM robustness, advancing the development of more reliable language models for real-world deployment.
arXiv Detail & Related papers (2025-04-24T16:36:19Z)
A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation [93.28532038721816]
Malicious applications of visual manipulation have raised serious threats to the security and reputation of users in many fields. We propose a knowledge-guided adversarial defense (KGAD) to actively force malicious manipulation models to output semantically confusing samples.
arXiv Detail & Related papers (2025-04-11T10:18:13Z)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents. Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses. This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z)
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation [30.334268991701727]
Agent-based simulation is crucial for modeling complex human behavior. Traditional approaches require extensive domain knowledge and large datasets. Large language models (LLMs) offer a promising alternative by leveraging broad world knowledge.
arXiv Detail & Related papers (2025-03-25T20:24:47Z)
Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies [0.0]
This paper conducts a comparison of approaches to detecting misinformation between text-based, multimodal, and agentic approaches. We evaluate the effectiveness of fine-tuned models, zero-shot learning, and systematic fact-checking mechanisms in detecting misinformation across different topic domains.
arXiv Detail & Related papers (2025-03-02T04:31:42Z)
Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey [48.52320309766703]
Knowledge-enhanced language models (KELMs) have emerged as promising tools to bridge the gap between large-scale language models and domain-specific knowledge. KELMs can achieve higher factual accuracy and hallucinations by leveraging knowledge graphs (KGs)
arXiv Detail & Related papers (2024-11-25T14:10:24Z)
HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment [1.8843687952462742]
This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities. Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels. We provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model.
arXiv Detail & Related papers (2024-11-11T10:02:49Z)
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures [5.062846614331549]
This study systematically analyzes the vulnerability of 36 large language models (LLMs) to various prompt injection attacks. Across 144 prompt injection tests, we observed a strong correlation between model parameters and vulnerability.
arXiv Detail & Related papers (2024-10-28T18:55:21Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM) To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks [10.909463767558023]
We propose an innovative approach for the real-time detection of jailbreak attacks by utilizing neural activation features. Our method holds promise for future systems integrating LLMs, offering robust real-time detection capabilities.
arXiv Detail & Related papers (2024-08-27T17:14:21Z)
Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks. LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model. We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach to Model Interpretability and Precision [1.5501208213584152]
We introduce an interpretable, multimodal model for Alzheimer's disease (AD) classification over its multi-stage progression, incorporating Jacobian Saliency Map (JSM) as a modality-agnostic tool. Our evaluation including ablation study manifests the efficacy of using JSM for model debug and interpretation, while significantly enhancing model accuracy as well.
arXiv Detail & Related papers (2024-02-25T06:53:35Z)
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models [53.84677081899392]
KIEval is a Knowledge-grounded Interactive Evaluation framework for large language models. It incorporates an LLM-powered "interactor" role for the first time to accomplish a dynamic contamination-resilient evaluation. Extensive experiments on seven leading LLMs across five datasets validate KIEval's effectiveness and generalization.
arXiv Detail & Related papers (2024-02-23T01:30:39Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Unsupervised deep learning techniques for powdery mildew recognition based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves. We focus on unsupervised deep learning techniques applied to multispectral imaging data. We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.