Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
- URL: http://arxiv.org/abs/2510.27629v3
- Date: Tue, 04 Nov 2025 03:57:55 GMT
- Title: Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models
- Authors: Boyi Wei, Zora Che, Nathaniel Li, Udari Madhushani Sehwag, Jasper Götting, Samira Nedungadi, Julian Michael, Summer Yue, Dan Hendrycks, Peter Henderson, Zifan Wang, Seth Donoughe, Mantas Mazeika,
- Abstract summary: Open-weight bio-foundation models could enable bad actors to develop more deadly bioweapons.<n>Current approaches focus on filtering biohazardous data during pre-training.<n>BioRiskEval is a framework to evaluate the robustness of procedures intended to reduce the dual-use capabilities of bio-foundation models.
- Score: 24.414900360499548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-weight bio-foundation models present a dual-use dilemma. While holding great promise for accelerating scientific research and drug development, they could also enable bad actors to develop more deadly bioweapons. To mitigate the risk posed by these models, current approaches focus on filtering biohazardous data during pre-training. However, the effectiveness of such an approach remains unclear, particularly against determined actors who might fine-tune these models for malicious use. To address this gap, we propose BioRiskEval, a framework to evaluate the robustness of procedures that are intended to reduce the dual-use capabilities of bio-foundation models. BioRiskEval assesses models' virus understanding through three lenses, including sequence modeling, mutational effects prediction, and virulence prediction. Our results show that current filtering practices may not be particularly effective: Excluded knowledge can be rapidly recovered in some cases via fine-tuning, and exhibits broader generalizability in sequence modeling. Furthermore, dual-use signals may already reside in the pretrained representations, and can be elicited via simple linear probing. These findings highlight the challenges of data filtering as a standalone procedure, underscoring the need for further research into robust safety and security strategies for open-weight bio-foundation models.
Related papers
- Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset [0.38186458149494623]
This paper discusses the pilot implementation of the Bacterial Biothreat Benchmark (B3) dataset.<n>It is the third in a series of three papers describing an overall Biothreat Benchmark Generation (BBG) framework.<n>Overall, the pilot demonstrated that the B3 dataset offers a viable, nuanced method for rapidly assessing the biosecurity risk posed by a LLM.
arXiv Detail & Related papers (2025-12-09T10:31:02Z) - RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing [2.918639959397167]
We introduce RApid PosTeriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework.<n>RAPTOR-GEN is designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data.<n>We develop a fast and robust RAPTOR-GEN algorithm with controllable error.
arXiv Detail & Related papers (2025-09-25T05:20:49Z) - Resilient Biosecurity in the Era of AI-Enabled Bioweapons [0.0]
Existing biosafety measures rely on sequence alignment and protein-protein interaction prediction to detect dangerous outputs.<n>We evaluate the performance of three leading PPI prediction tools: AlphaFold 3, AF3Complex, and SpatialPPIv2.<n>None of the tools successfully identify any of the four experimentally validated SARS-CoV-2 mutants with confirmed binding.
arXiv Detail & Related papers (2025-08-30T18:09:04Z) - Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z) - The Reality of AI and Biorisk [24.945718952309157]
It is necessary to have both a sound theoretical threat model for how AI models or systems could increase biorisk and a robust method for testing that threat model.<n>This paper provides an analysis of existing available research surrounding two AI and biorisk threat models.
arXiv Detail & Related papers (2024-12-02T20:14:46Z) - Perturb, Attend, Detect and Localize (PADL): Robust Proactive Image Defense [5.150608040339816]
We introduce PADL, a new solution able to generate image-specific perturbations using a symmetric scheme of encoding and decoding based on cross-attention.
Our method generalizes to a range of unseen models with diverse architectural designs, such as StarGANv2, BlendGAN, DiffAE, StableDiffusion and StableDiffusionXL.
arXiv Detail & Related papers (2024-09-26T15:16:32Z) - Unmasking unlearnable models: a classification challenge for biomedical images without visible cues [0.0]
We demystify the complexity of MGMT status prediction through a comprehensive exploration.
Our finding highlighted that current models are unlearnable and may require new architectures to explore applications in the real world.
arXiv Detail & Related papers (2024-07-29T08:12:42Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - EPL: Evidential Prototype Learning for Semi-supervised Medical Image Segmentation [0.0]
We propose Evidential Prototype Learning (EPL) to fuse voxel probability predictions from different sources and prototype fusion utilization of labeled and unlabeled data.
The uncertainty not only enables the model to self-correct predictions but also improves the guided learning process with pseudo-labels and is able to feed back into the construction of hidden features.
arXiv Detail & Related papers (2024-04-09T10:04:06Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - The Surprising Harmfulness of Benign Overfitting for Adversarial
Robustness [13.120373493503772]
We prove a surprising result that even if the ground truth itself is robust to adversarial examples, the benignly overfitted model is benign in terms of the standard'' out-of-sample risk objective.
Our finding provides theoretical insights into the puzzling phenomenon observed in practice, where the true target function (e.g., human) is robust against adverasrial attack, while beginly overfitted neural networks lead to models that are not robust.
arXiv Detail & Related papers (2024-01-19T15:40:46Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Embracing assay heterogeneity with neural processes for markedly
improved bioactivity predictions [0.276240219662896]
Predicting the bioactivity of a ligand is one of the hardest and most important challenges in computer-aided drug discovery.
Despite years of data collection and curation efforts, bioactivity data remains sparse and heterogeneous.
We present a hierarchical meta-learning framework that exploits the information synergy across disparate assays.
arXiv Detail & Related papers (2023-08-17T16:26:58Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - A General Framework for Survival Analysis and Multi-State Modelling [70.31153478610229]
We use neural ordinary differential equations as a flexible and general method for estimating multi-state survival models.
We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.
arXiv Detail & Related papers (2020-06-08T19:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.