SparseChem: Fast and accurate machine learning model for small molecules
- URL: http://arxiv.org/abs/2203.04676v1
- Date: Wed, 9 Mar 2022 12:40:35 GMT
- Title: SparseChem: Fast and accurate machine learning model for small molecules
- Authors: Adam Arany, Jaak Simm, Martijn Oldenhof and Yves Moreau
- Abstract summary: SparseChem provides fast and accurate machine learning models for biochemical applications.
It is possible to train classification, regression and censored regression models, or combination of them from command line.
Source code and documentation is freely available under MIT License on GitHub.
- Score: 6.88204255655161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: SparseChem provides fast and accurate machine learning models for biochemical
applications. Especially, the package supports very high-dimensional sparse
inputs, e.g., millions of features and millions of compounds. It is possible to
train classification, regression and censored regression models, or combination
of them from command line. Additionally, the library can be accessed directly
from Python. Source code and documentation is freely available under MIT
License on GitHub.
Related papers
- PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.
PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.
We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - Cuvis.Ai: An Open-Source, Low-Code Software Ecosystem for Hyperspectral Processing and Classification [0.4038539043067986]
cuvis.ai is an open-source and low-code software ecosystem for data acquisition, preprocessing, and model training.
The package is written in Python and provides wrappers around common machine learning libraries.
arXiv Detail & Related papers (2024-11-18T06:33:40Z) - BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery [66.97700597098215]
We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models.
On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days.
The BioNeMo Framework is open-source and free for everyone to use.
arXiv Detail & Related papers (2024-11-15T19:46:16Z) - RGFN: Synthesizable Molecular Generation Using GFlowNets [51.33672611338754]
We propose Reaction-GFlowNet, an extension of the GFlowNet framework that operates directly in the space of chemical reactions.
RGFN allows out-of-the-box synthesizability while maintaining comparable quality of generated candidates.
We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.
arXiv Detail & Related papers (2024-06-01T13:11:11Z) - A Python library for efficient computation of molecular fingerprints [0.0]
We create a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive.
The library enables the user to perform computation on large datasets using parallelism.
We show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions.
arXiv Detail & Related papers (2024-03-27T19:02:09Z) - Synthetic data enable experiments in atomistic machine learning [0.0]
We demonstrate the use of a large dataset labelled with per-atom energies from an existing ML potential model.
The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints.
We show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets.
arXiv Detail & Related papers (2022-11-29T18:17:24Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ [118.04625413322827]
$texttt5x$ and $texttseqio$ are open source software libraries for building and training language models.
These libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
arXiv Detail & Related papers (2022-03-31T17:12:13Z) - Deeptime: a Python library for machine learning dynamical models from
time series data [3.346668383314945]
Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data.
In this paper we introduce the main features and structure of the deeptime software.
arXiv Detail & Related papers (2021-10-28T10:53:03Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z) - ML4Chem: A Machine Learning Package for Chemistry and Materials Science [0.0]
ML4Chem is an open-source machine learning library for chemistry and materials science.
It provides an extendable platform to develop and deploy machine learning models and pipelines.
Here we introduce its atomistic module for the implementation, deployment, and inference.
arXiv Detail & Related papers (2020-03-02T00:28:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.