SparseChem: Fast and accurate machine learning model for small molecules
- URL: http://arxiv.org/abs/2203.04676v1
- Date: Wed, 9 Mar 2022 12:40:35 GMT
- Title: SparseChem: Fast and accurate machine learning model for small molecules
- Authors: Adam Arany, Jaak Simm, Martijn Oldenhof and Yves Moreau
- Abstract summary: SparseChem provides fast and accurate machine learning models for biochemical applications.
It is possible to train classification, regression and censored regression models, or combination of them from command line.
Source code and documentation is freely available under MIT License on GitHub.
- Score: 6.88204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SparseChem provides fast and accurate machine learning models for biochemical
applications. Especially, the package supports very high-dimensional sparse
inputs, e.g., millions of features and millions of compounds. It is possible to
train classification, regression and censored regression models, or combination
of them from command line. Additionally, the library can be accessed directly
from Python. Source code and documentation is freely available under MIT
License on GitHub.
Related papers
- BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery [66.97700597098215]
We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models.
On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days.
The BioNeMo Framework is open-source and free for everyone to use.
arXiv Detail & Related papers (2024-11-15T19:46:16Z) - Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python [0.0]
skfp is a Python package for computation of molecular fingerprints for applications in chemoinformatics.
skfp offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines.
It is also flexible, highly efficient, and fully open source.
arXiv Detail & Related papers (2024-07-18T08:45:14Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - A Python library for efficient computation of molecular fingerprints [0.0]
We create a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive.
The library enables the user to perform computation on large datasets using parallelism.
We show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions.
arXiv Detail & Related papers (2024-03-27T19:02:09Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Synthetic data enable experiments in atomistic machine learning [0.0]
We demonstrate the use of a large dataset labelled with per-atom energies from an existing ML potential model.
The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints.
We show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets.
arXiv Detail & Related papers (2022-11-29T18:17:24Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ [118.04625413322827]
$texttt5x$ and $texttseqio$ are open source software libraries for building and training language models.
These libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
arXiv Detail & Related papers (2022-03-31T17:12:13Z) - Deeptime: a Python library for machine learning dynamical models from
time series data [3.346668383314945]
Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data.
In this paper we introduce the main features and structure of the deeptime software.
arXiv Detail & Related papers (2021-10-28T10:53:03Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z) - ML4Chem: A Machine Learning Package for Chemistry and Materials Science [0.0]
ML4Chem is an open-source machine learning library for chemistry and materials science.
It provides an extendable platform to develop and deploy machine learning models and pipelines.
Here we introduce its atomistic module for the implementation, deployment, and inference.
arXiv Detail & Related papers (2020-03-02T00:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.