Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials
- URL: http://arxiv.org/abs/2510.07853v1
- Date: Thu, 09 Oct 2025 06:51:12 GMT
- Title: Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials
- Authors: Thomas Lautenschlager, Nils Friederich, Angelo Jovin Yamachui Sitcheu, Katja Nau, Gaƫlle Hayot, Thomas Dickmeis, Ralf Mikut,
- Abstract summary: We demonstrate how representations learned via self-supervised learning can effectively identify toxicant-induced changes.<n>Our analysis shows that the learned representations using self-supervised learning are suitable for effectively distinguishing between the modes-of-action of different compounds.
- Score: 1.2197883665266451
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-throughput toxicity testing offers a fast and cost-effective way to test large amounts of compounds. A key component for such systems is the automated evaluation via machine learning models. In this paper, we address critical challenges in this domain and demonstrate how representations learned via self-supervised learning can effectively identify toxicant-induced changes. We provide a proof-of-concept that utilizes the publicly available EmbryoNet dataset, which contains ten zebrafish embryo phenotypes elicited by various chemical compounds targeting different processes in early embryonic development. Our analysis shows that the learned representations using self-supervised learning are suitable for effectively distinguishing between the modes-of-action of different compounds. Finally, we discuss the integration of machine learning models in a physical toxicity testing device in the context of the TOXBOX project.
Related papers
- Not-in-Perspective: Towards Shielding Google's Perspective API Against Adversarial Negation Attacks [1.675857332621569]
cyberbullying has escalated the need for effective ways to monitor and moderate online interactions.<n>Existing solutions of automated toxicity detection systems, are based on a machine or deep learning algorithms.<n>We present a set of formal reasoning-based methodologies that wrap around existing machine learning toxicity detection systems.
arXiv Detail & Related papers (2026-02-10T02:27:28Z) - Combining Deep Learning and Explainable AI for Toxicity Prediction of Chemical Compounds [0.764671395172401]
This research introduces a novel image-based pipeline based on DenseNet121, which processes 2D graphical representations of chemical structures.<n>We employ Grad-CAM visualizations, an explainable AI technique, to interpret the model's predictions and highlight molecular regions contributing to toxicity classification.
arXiv Detail & Related papers (2025-10-26T08:05:11Z) - Efficient Toxicity Detection in Gaming Chats: A Comparative Study of Embeddings, Fine-Tuned Transformers and LLMs [0.18907108368038214]
This paper presents a comparative analysis of Natural Language Processing (NLP) methods for automated toxicity detection in online gaming chats.<n>Traditional machine learning models with embeddings, large language models (LLMs), with zero-shot and few-shot prompting, fine-tuned transformer models, and retrieval-augmented generation approaches are evaluated.<n>The findings provide empirical evidence for deploying cost-effective, efficient content moderation systems in dynamic online gaming environments.
arXiv Detail & Related papers (2025-10-20T08:03:28Z) - PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z) - Causal integration of chemical structures improves representations of microscopy images for morphological profiling [25.027684911103897]
We introduce a representation learning framework, MICON, that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes.<n>We demonstrate that incorporating chemical compound information into the learning process provides consistent improvements in our evaluation setting.<n>Our findings point to a new direction for representation learning in morphological profiling, suggesting that methods should explicitly account for the multimodal nature of microscopy screening data.
arXiv Detail & Related papers (2025-04-13T12:27:21Z) - Advancing Out-of-Distribution Detection via Local Neuroplasticity [60.53625435889467]
This paper presents a novel OOD detection method that leverages the unique local neuroplasticity property of Kolmogorov-Arnold Networks (KANs)<n>Our method compares the activation patterns of a trained KAN against its untrained counterpart to detect OOD samples.<n>We validate our approach on benchmarks from image and medical domains, demonstrating superior performance and robustness compared to state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-20T11:13:41Z) - Intelligent Chemical Purification Technique Based on Machine Learning [5.023197681500998]
We present an innovative of artificial intelligence with column chromatography, aiming to resolve inefficiencies and standardize data collection in chemical separation and purification domain.
By developing an automated platform for precise data acquisition and employing advanced machine learning algorithms, we constructed predictive models to forecast key separation parameters.
A novel metric, separation probability ($S_p$), quantifies the likelihood of effective compound separation, validated through experimental verification.
arXiv Detail & Related papers (2024-04-14T01:44:58Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - On the Robustness of Random Forest Against Untargeted Data Poisoning: An
Ensemble-Based Approach [42.81632484264218]
In machine learning models, perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy.
This paper aims to implement a novel hash-based ensemble approach that protects random forest against untargeted, random poisoning attacks.
arXiv Detail & Related papers (2022-09-28T11:41:38Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Synthetic Image Rendering Solves Annotation Problem in Deep Learning
Nanoparticle Segmentation [5.927116192179681]
We show that using a rendering software allows to generate realistic, synthetic training data to train a state-of-the art deep neural network.
We derive a segmentation accuracy that is comparable to man-made annotations for toxicologically relevant metal-oxide nanoparticles ensembles.
arXiv Detail & Related papers (2020-11-20T17:05:36Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.