Predicting Chemical Hazard across Taxa through Machine Learning
- URL: http://arxiv.org/abs/2110.03688v1
- Date: Thu, 7 Oct 2021 15:33:58 GMT
- Title: Predicting Chemical Hazard across Taxa through Machine Learning
- Authors: Jimeng Wu, Simone D'Ambrosi, Lorenz Ammann, Julita Stadnicka-Michalak,
Kristin Schirmer, Marco Baity-Jesi
- Abstract summary: We analyze the relevance of taxonomy and experimental setup, and show that taking them into account can lead to considerable improvements in the classification performance.
We use our approach with standard machine learning models (K-nearest neighbors, random forests and deep neural networks), as well as the recently proposed Read-Across Structure Activity Relationship (RASAR) models.
- Score: 0.3262230127283452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We apply machine learning methods to predict chemical hazards focusing on
fish acute toxicity across taxa. We analyze the relevance of taxonomy and
experimental setup, and show that taking them into account can lead to
considerable improvements in the classification performance. We quantify the
gain obtained by introducing the taxonomic and experimental information,
compared to classifying based on chemical information alone. We use our
approach with standard machine learning models (K-nearest neighbors, random
forests and deep neural networks), as well as the recently proposed Read-Across
Structure Activity Relationship (RASAR) models, which were very successful in
predicting chemical hazards to mammals based on chemical similarity. We are
able to obtain accuracies of over 0.93 on datasets where, due to noise in the
data, the maximum achievable accuracy is expected to be below 0.95, which
results in an effective accuracy of 0.98. The best performances are obtained by
random forests and RASAR models. We analyze metrics to compare our results with
animal test reproducibility, and despite most of our models 'outperform animal
test reproducibility' as measured through recently proposed metrics, we show
that the comparison between machine learning performance and animal test
reproducibility should be addressed with particular care. While we focus on
fish mortality, our approach, provided that the right data is available, is
valid for any combination of chemicals, effects and taxa.
Related papers
- Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Harmful algal bloom forecasting. A comparison between stream and batch
learning [0.7067443325368975]
Harmful Algal Blooms (HABs) pose risks to public health and the shellfish industry.
This study develops a machine learning workflow for predicting the number of cells of a toxic dinoflagellate.
The model DoME emerged as the most effective and interpretable predictor, outperforming the other algorithms.
arXiv Detail & Related papers (2024-02-20T15:01:11Z) - Hybrid Machine Learning techniques in the management of harmful algal
blooms impact [0.7864304771129751]
Mollusc farming can be affected by Harmful algal blooms (HABs)
HABs are episodes of high concentrations of algae that are potentially toxic for human consumption.
To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected.
arXiv Detail & Related papers (2024-02-14T15:59:22Z) - Producing Plankton Classifiers that are Robust to Dataset Shift [1.716364772047407]
We integrate ZooLake dataset with manually-annotated images from 10 independent days of deployment to benchmark Out-Of-Dataset (OOD) performances.
We propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification.
We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model, which we call BEsT model.
arXiv Detail & Related papers (2024-01-25T15:47:18Z) - Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse
Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data.
The results of our analysis demonstrate the performance of each model in terms of accuracy.
The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - MetaRF: Differentiable Random Forest for Reaction Yield Prediction with
a Few Trails [58.47364143304643]
In this paper, we focus on the reaction yield prediction problem.
We first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction.
To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method.
arXiv Detail & Related papers (2022-08-22T06:40:13Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Extracting Chemical-Protein Interactions via Calibrated Deep Neural
Network and Self-training [0.8376091455761261]
"calibration" techniques have been applied to deep learning models to estimate the data uncertainty and improve the reliability.
In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques.
Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches.
arXiv Detail & Related papers (2020-11-04T10:14:31Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.