ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning
- URL: http://arxiv.org/abs/2302.04917v1
- Date: Thu, 9 Feb 2023 20:19:57 GMT
- Title: ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning
- Authors: Alexander M. Moore, Randy C. Paffenroth, Ken T. Ngo, Joshua R. Uzarski
- Abstract summary: This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
- Score: 60.02503434201552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate chemical sensors are vital in medical, military, and home safety
applications. Training machine learning models to be accurate on real world
chemical sensor data requires performing many diverse, costly experiments in
controlled laboratory settings to create a data set. In practice even
expensive, large data sets may be insufficient for generalization of a trained
model to a real-world testing distribution. Rather than perform greater numbers
of experiments requiring exhaustive mixtures of chemical analytes, this
research proposes learning approximations of complex exposures from training
sets of simple ones by using single-analyte exposure signals as building blocks
of a multiple-analyte space. We demonstrate this approach to synthetic sensor
responses surprisingly improves the detection of out-of-distribution obscured
chemical analytes. Further, we pair these synthetic signals to targets in an
information-dense representation space utilizing a large corpus of chemistry
knowledge. Through utilization of a semantically meaningful analyte
representation spaces along with synthetic targets we achieve rapid analyte
classification in the presence of obscurants without corresponding
obscured-analyte training data. Transfer learning for supervised learning with
molecular representations makes assumptions about the input data. Instead, we
borrow from the natural language and natural image processing literature for a
novel approach to chemical sensor signal classification using molecular
semantics for arbitrary chemical sensor hardware designs.
Related papers
- MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
Using molecular fingerprints, a numerical representation of a compound's location within the chemical space, we show that accounting for correlation amongst chemical compounds improves predictive performance.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Synthetic Image Rendering Solves Annotation Problem in Deep Learning
Nanoparticle Segmentation [5.927116192179681]
We show that using a rendering software allows to generate realistic, synthetic training data to train a state-of-the art deep neural network.
We derive a segmentation accuracy that is comparable to man-made annotations for toxicologically relevant metal-oxide nanoparticles ensembles.
arXiv Detail & Related papers (2020-11-20T17:05:36Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Optical oxygen sensing with artificial intelligence [0.0]
This work explores an entirely new artificial intelligence approach and demonstrates the feasibility of oxygen sensing through machine learning.
The specifically developed neural network learns very efficiently to relate the input quantities to the oxygen concentration.
The results show a mean deviation of the predicted from the measured concentration of 0.5 percent air, comparable to many commercial and low-cost sensors.
arXiv Detail & Related papers (2020-07-27T20:59:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.