Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- URL: http://arxiv.org/abs/2407.13291v3
- Date: Thu, 24 Oct 2024 17:08:56 GMT
- Title: Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- Authors: Jakub Adamczyk, Piotr Ludynia,
- Abstract summary: skfp is a Python package for computation of molecular fingerprints for applications in chemoinformatics.
skfp offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines.
It is also flexible, highly efficient, and fully open source.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present \skfp, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, \skfp~stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.
Related papers
- Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping [0.0]
The Deep Fast Machine Learning Utils (DFMLU) library provides tools designed to automate and enhance aspects of machine learning processes.
DFMLU offers functionalities that support model development and data handling.
This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
arXiv Detail & Related papers (2024-09-14T21:39:17Z) - A Python library for efficient computation of molecular fingerprints [0.0]
We create a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive.
The library enables the user to perform computation on large datasets using parallelism.
We show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions.
arXiv Detail & Related papers (2024-03-27T19:02:09Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - A Library for Representing Python Programs as Graphs for Machine
Learning [39.483608364770824]
We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs.
We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
arXiv Detail & Related papers (2022-08-15T22:36:17Z) - SparseChem: Fast and accurate machine learning model for small molecules [6.88204255655161]
SparseChem provides fast and accurate machine learning models for biochemical applications.
It is possible to train classification, regression and censored regression models, or combination of them from command line.
Source code and documentation is freely available under MIT License on GitHub.
arXiv Detail & Related papers (2022-03-09T12:40:35Z) - Python for Smarter Cities: Comparison of Python libraries for static and
interactive visualisations of large vector data [0.0]
Python, with its concise and natural syntax, presents a low barrier to entry for municipal staff without computer science backgrounds.
This study assesses prominent, actively-developed visualisation libraries in the Python ecosystem with respect to producing visualisations of large vector datasets.
All short-listed libraries were able to generate the sample map products for both a small and larger dataset.
arXiv Detail & Related papers (2022-02-26T10:23:29Z) - pymdp: A Python library for active inference in discrete state spaces [52.85819390191516]
pymdp is an open-source package for simulating active inference in Python.
We provide the first open-source package for simulating active inference with POMDPs.
arXiv Detail & Related papers (2022-01-11T12:18:44Z) - Solo-learn: A Library of Self-supervised Methods for Visual
Representation Learning [83.02597612195966]
solo-learn is a library of self-supervised methods for visual representation learning.
Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs.
arXiv Detail & Related papers (2021-08-03T22:19:55Z) - Extending Python for Quantum-Classical Computing via Quantum
Just-in-Time Compilation [78.8942067357231]
Python is a popular programming language known for its flexibility, usability, readability, and focus on developer productivity.
We present a language extension to Python that enables heterogeneous quantum-classical computing via a robust C++ infrastructure for quantum just-in-time compilation.
arXiv Detail & Related papers (2021-05-10T21:11:21Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - Biomedical and Clinical English Model Packages in the Stanza Python NLP
Library [47.47381610312517]
We introduce biomedical and clinical English model packages for the Stanza Python NLP library.
These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text.
We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results.
arXiv Detail & Related papers (2020-07-29T07:27:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.