Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- URL: http://arxiv.org/abs/2407.13291v1
- Date: Thu, 18 Jul 2024 08:45:14 GMT
- Title: Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- Authors: Jakub Adamczyk, Piotr Ludynia,
- Abstract summary: We present textitscikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics.
Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines.
It is also flexible, highly efficient, and fully open source.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present \textit{scikit-fingerprints}, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, \textit{scikit-fingerprints} stands as the most feature-rich library in the Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.
Related papers
- A Python library for efficient computation of molecular fingerprints [0.0]
We create a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive.
The library enables the user to perform computation on large datasets using parallelism.
We show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions.
arXiv Detail & Related papers (2024-03-27T19:02:09Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - A Library for Representing Python Programs as Graphs for Machine
Learning [39.483608364770824]
We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs.
We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
arXiv Detail & Related papers (2022-08-15T22:36:17Z) - SparseChem: Fast and accurate machine learning model for small molecules [6.88204255655161]
SparseChem provides fast and accurate machine learning models for biochemical applications.
It is possible to train classification, regression and censored regression models, or combination of them from command line.
Source code and documentation is freely available under MIT License on GitHub.
arXiv Detail & Related papers (2022-03-09T12:40:35Z) - Python for Smarter Cities: Comparison of Python libraries for static and
interactive visualisations of large vector data [0.0]
Python, with its concise and natural syntax, presents a low barrier to entry for municipal staff without computer science backgrounds.
This study assesses prominent, actively-developed visualisation libraries in the Python ecosystem with respect to producing visualisations of large vector datasets.
All short-listed libraries were able to generate the sample map products for both a small and larger dataset.
arXiv Detail & Related papers (2022-02-26T10:23:29Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - Biomedical and Clinical English Model Packages in the Stanza Python NLP
Library [47.47381610312517]
We introduce biomedical and clinical English model packages for the Stanza Python NLP library.
These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text.
We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results.
arXiv Detail & Related papers (2020-07-29T07:27:41Z) - Kernel methods library for pattern analysis and machine learning in
python [0.0]
The kernelmethods library fills that important void in the python ML ecosystem in a domain-agnostic fashion.
The library provides a number of well-defined classes to make various kernel-based operations efficient.
arXiv Detail & Related papers (2020-05-27T16:44:42Z) - TorchIO: A Python library for efficient loading, preprocessing,
augmentation and patch-based sampling of medical images in deep learning [68.8204255655161]
We present TorchIO, an open-source Python library to enable efficient loading, preprocessing, augmentation and patch-based sampling of medical images for deep learning.
TorchIO follows the style of PyTorch and integrates standard medical image processing libraries to efficiently process images during training of neural networks.
It includes a command-line interface which allows users to apply transforms to image files without using Python.
arXiv Detail & Related papers (2020-03-09T13:36:16Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.