Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
- URL: http://arxiv.org/abs/2512.17146v1
- Date: Fri, 19 Dec 2025 00:51:11 GMT
- Title: Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
- Authors: Huixin Zhan,
- Abstract summary: We introduce the Secure Agentic Genomic Evaluator (SAGE), an agentic framework for auditing the adversarial vulnerabilities of GFMs.<n>Using SAGE, we find that even state-of-the-art GFMs like ESM2 are sensitive to targeted soft prompt attacks.
- Score: 4.781986758380065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Genomic Foundation Models (GFMs), such as Evolutionary Scale Modeling (ESM), have demonstrated remarkable success in variant effect prediction. However, their security and robustness under adversarial manipulation remain largely unexplored. To address this gap, we introduce the Secure Agentic Genomic Evaluator (SAGE), an agentic framework for auditing the adversarial vulnerabilities of GFMs. SAGE functions through an interpretable and automated risk auditing loop. It injects soft prompt perturbations, monitors model behavior across training checkpoints, computes risk metrics such as AUROC and AUPR, and generates structured reports with large language model-based narrative explanations. This agentic process enables continuous evaluation of embedding-space robustness without modifying the underlying model. Using SAGE, we find that even state-of-the-art GFMs like ESM2 are sensitive to targeted soft prompt attacks, resulting in measurable performance degradation. These findings reveal critical and previously hidden vulnerabilities in genomic foundation models, showing the importance of agentic risk auditing in securing biomedical applications such as clinical variant interpretation.
Related papers
- NAAMSE: Framework for Evolutionary Security Evaluation of Agents [1.0131895986034316]
We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem.<n>Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring.<n>Experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods.
arXiv Detail & Related papers (2026-02-07T06:13:02Z) - From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z) - Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs [1.036334370262262]
We introduce AgentSeer, an observability-based evaluation framework that decomposes agentic executions into granular action and component graphs.<n>We demonstrate fundamental differences between model-level and agentic-level vulnerability profiles.<n>Agentic-level assessment exposes agent-specific risks invisible to traditional evaluation.
arXiv Detail & Related papers (2025-09-05T04:36:17Z) - SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models [8.019763193322298]
We propose SafeGenes: a framework for Secure analysis of genomic foundation models.<n>We assess the adversarial vulnerabilities of GFMs using two approaches: the Fast Gradient Sign Method and a soft prompt attack.<n>Targeted soft prompt attacks led to substantial performance degradation, even in large models such as ESM1b and ESM1v.
arXiv Detail & Related papers (2025-06-01T03:54:03Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.<n>Our research identifies two critical latent factors affecting RAG's confidence in its predictions.<n>We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.<n>Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.<n>However, the deployment of these agents in physical environments presents significant safety challenges.<n>This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach
to Model Interpretability and Precision [1.5501208213584152]
We introduce an interpretable, multimodal model for Alzheimer's disease (AD) classification over its multi-stage progression, incorporating Jacobian Saliency Map (JSM) as a modality-agnostic tool.
Our evaluation including ablation study manifests the efficacy of using JSM for model debug and interpretation, while significantly enhancing model accuracy as well.
arXiv Detail & Related papers (2024-02-25T06:53:35Z) - Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications.<n>The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs.<n>We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.