Evaluating machine learning models for predicting pesticides toxicity to honey bees
- URL: http://arxiv.org/abs/2503.24305v3
- Date: Thu, 10 Apr 2025 09:38:53 GMT
- Title: Evaluating machine learning models for predicting pesticides toxicity to honey bees
- Authors: Jakub Adamczyk, Jakub Poziemski, Pawel Siedlecki,
- Abstract summary: ApisTox is the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee.<n>We evaluate ApisTox using a diverse suite of machine learning approaches, including molecular fingerprints, graph kernels, and graph neural networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Small molecules play a critical role in the biomedical, environmental, and agrochemical domains, each with distinct physicochemical requirements and success criteria. Although biomedical research benefits from extensive datasets and established benchmarks, agrochemical data remain scarce, particularly with respect to species-specific toxicity. This work focuses on ApisTox, the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee (Apis mellifera), an ecologically vital pollinator. We evaluate ApisTox using a diverse suite of machine learning approaches, including molecular fingerprints, graph kernels, and graph neural networks, as well as pretrained models. Comparative analysis with medicinal datasets from the MoleculeNet benchmark reveals that ApisTox represents a distinct chemical space. Performance degradation on non-medicinal datasets, such as ApisTox, demonstrates their limited generalizability of current state-of-the-art algorithms trained solely on biomedical data. Our study highlights the need for more diverse datasets and for targeted model development geared toward the agrochemical domain.
Related papers
- Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.
Key theoretical contribution is the structural sparsity of causal connections between modalities.
Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth.
Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format.
This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful.
We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z) - MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
We present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees [0.0]
ApisTox is a comprehensive dataset focusing on the toxicity of pesticides to honey bees (Apis mellifera)<n>This dataset combines and leverages data from existing sources such as ECOTOX and PPDB.<n>ApisTox offers a unique resource for benchmarking molecular property prediction methods on agrochemical compounds.
arXiv Detail & Related papers (2024-04-24T20:35:17Z) - ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Human Limits in Machine Learning: Prediction of Plant Phenotypes Using
Soil Microbiome Data [0.2812395851874055]
We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes.
We show that prediction is improved when incorporating environmental features like soil physicochemical properties and microbial population density into the models.
arXiv Detail & Related papers (2023-06-19T20:52:37Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - A Distant Supervision Corpus for Extracting Biomedical Relationships
Between Chemicals, Diseases and Genes [35.372588846754645]
ChemDisGene is a new dataset for training and evaluating multi-class multi-label document-level biomedical relation extraction models.
Our dataset contains 80k biomedical research abstracts labeled with mentions of chemicals, diseases, and genes.
arXiv Detail & Related papers (2022-04-13T18:02:05Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Graph Neural Networks for the Prediction of Substrate-Specific Organic
Reaction Conditions [79.45090959869124]
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions.
We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions.
arXiv Detail & Related papers (2020-07-08T17:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.