Censoring chemical data to mitigate dual use risk
- URL: http://arxiv.org/abs/2304.10510v1
- Date: Thu, 20 Apr 2023 17:46:30 GMT
- Title: Censoring chemical data to mitigate dual use risk
- Authors: Quintina L. Campbell, Jonathan Herington, Andrew D. White
- Abstract summary: We propose a model-agnostic method of selectively noising datasets while preserving the utility of the data for training deep neural networks.
Our findings show selectively noised datasets can induce model variance and bias in predictions for sensitive labels with control.
This work is proposed as a foundation for future research on enabling more secure and collaborative data sharing practices.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dual use of machine learning applications, where models can be used for
both beneficial and malicious purposes, presents a significant challenge. This
has recently become a particular concern in chemistry, where chemical datasets
containing sensitive labels (e.g. toxicological information) could be used to
develop predictive models that identify novel toxins or chemical warfare
agents. To mitigate dual use risks, we propose a model-agnostic method of
selectively noising datasets while preserving the utility of the data for
training deep neural networks in a beneficial region. We evaluate the
effectiveness of the proposed method across least squares, a multilayer
perceptron, and a graph neural network. Our findings show selectively noised
datasets can induce model variance and bias in predictions for sensitive labels
with control, suggesting the safe sharing of datasets containing sensitive
information is feasible. We also find omitting sensitive data often increases
model variance sufficiently to mitigate dual use. This work is proposed as a
foundation for future research on enabling more secure and collaborative data
sharing practices and safer machine learning applications in chemistry.
Related papers
- Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Blend the Separated: Mixture of Synergistic Experts for Data-Scarcity Drug-Target Interaction Prediction [39.410724831865245]
Drug-target interaction prediction (DTI) is essential in various applications including drug discovery and clinical application.
There are two perspectives of input data widely used in DTI prediction: Intrinsic data represents how drugs or targets are constructed, and extrinsic data represents how drugs or targets are related to other biological entities.
We propose the first method to tackle DTI prediction under input data and/or label scarcity.
arXiv Detail & Related papers (2025-03-20T02:27:16Z) - Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [72.39098405805318]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models.<n>This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries.<n>In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z) - Ensuring Medical AI Safety: Explainable AI-Driven Detection and Mitigation of Spurious Model Behavior and Associated Data [14.991686165405959]
We introduce a semi-automated framework for the identification of spurious behavior from both data and model perspective.
This allows the retrieval of spurious data points and the detection of model circuits that encode the associated prediction rules.
We show the applicability of our framework using four medical datasets, featuring controlled and real-world spurious correlations.
arXiv Detail & Related papers (2025-01-23T16:39:09Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification [2.2940141855172036]
In molecular biology, there has been an explosion of data generated from multi-omics sequencing.
Traditional statistical methods face challenging tasks when dealing with such high dimensional data.
This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features.
arXiv Detail & Related papers (2024-05-16T01:45:55Z) - Have You Poisoned My Data? Defending Neural Networks against Data Poisoning [0.393259574660092]
We propose a novel approach to detect and filter poisoned datapoints in the transfer learning setting.
We show that effective poisons can be successfully differentiated from clean points in the characteristic vector space.
Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance.
arXiv Detail & Related papers (2024-03-20T11:50:16Z) - Leveraging Internal Representations of Model for Magnetic Image
Classification [0.13654846342364302]
This paper introduces a potentially groundbreaking paradigm for machine learning model training, specifically designed for scenarios with only a single magnetic image and its corresponding label image available.
We harness the capabilities of Deep Learning to generate concise yet informative samples, aiming to overcome data scarcity.
arXiv Detail & Related papers (2024-03-11T15:15:50Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Learning inducing points and uncertainty on molecular data by scalable
variational Gaussian processes [0.0]
We show that variational learning of the inducing points in a molecular descriptor space improves the prediction of energies and atomic forces on two molecular dynamics datasets.
We extend our study to a large molecular crystal system, showing that variational GP models perform well for predicting atomic forces by efficiently learning a sparse representation of the dataset.
arXiv Detail & Related papers (2022-07-16T10:41:41Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - MOOMIN: Deep Molecular Omics Network for Anti-Cancer Drug Combination
Therapy [2.446672595462589]
We propose a multimodal graph neural network that can predict the synergistic effect of drug combinations for cancer treatment.
Our model captures the representation based on the context of drugs at multiple scales based on a drug-protein interaction network and metadata.
We demonstrate that the model makes high-quality predictions over a wide range of cancer cell line tissues.
arXiv Detail & Related papers (2021-10-28T13:10:25Z) - Machine learning on DNA-encoded library count data using an
uncertainty-aware probabilistic loss function [1.5559232742666467]
We show a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function.
We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2.
arXiv Detail & Related papers (2021-08-27T19:37:06Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Attack-agnostic Adversarial Detection on Medical Data Using Explainable
Machine Learning [0.0]
We propose a model agnostic explainability-based method for the accurate detection of adversarial samples on two datasets.
On the MIMIC-III and Henan-Renmin EHR datasets, we report a detection accuracy of 77% against the Longitudinal Adrial Attack.
On the MIMIC-CXR dataset, we achieve an accuracy of 88%; significantly improving on the state of the art of adversarial detection in both datasets by over 10% in all settings.
arXiv Detail & Related papers (2021-05-05T10:01:53Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.