Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- URL: http://arxiv.org/abs/2407.13291v4
- Date: Thu, 12 Dec 2024 13:35:22 GMT
- Title: Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python
- Authors: Jakub Adamczyk, Piotr Ludynia,
- Abstract summary: scikit-fingerprints is a Python package for computation of molecular fingerprints for applications in chemoinformatics.
Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines.
- Score: 0.0
- License:
- Abstract: In this work, we present scikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy integration with machine learning pipelines. It is also highly optimized, featuring parallel computation that enables efficient processing of large molecular datasets. Currently, scikit-fingerprints stands as the most feature-rich library in the open source Python ecosystem, offering over 30 molecular fingerprints. Our library simplifies chemoinformatics tasks based on molecular fingerprints, including molecular property prediction and virtual screening. It is also flexible, highly efficient, and fully open source.
Related papers
- PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.
PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.
We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping [0.0]
The Deep Fast Machine Learning Utils (DFMLU) library provides tools designed to automate and enhance aspects of machine learning processes.
DFMLU offers functionalities that support model development and data handling.
This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
arXiv Detail & Related papers (2024-09-14T21:39:17Z) - FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition [11.885529039351217]
We introduce FP-VEC, a pilot study on using fingerprint vectors as an efficient fingerprinting method for Large Language Models.
Our approach generates a fingerprint vector that represents a confidential signature embedded in the model, allowing the same fingerprint to be seamlessly incorporated into an unlimited number of LLMs.
Results on several LLMs show that FP-VEC is lightweight by running on CPU-only devices for fingerprinting, scalable with a single training and unlimited fingerprinting process, and preserves the model's normal behavior.
arXiv Detail & Related papers (2024-09-13T14:04:39Z) - A Python library for efficient computation of molecular fingerprints [0.0]
We create a Python library that computes molecular fingerprints efficiently and delivers an interface that is comprehensive.
The library enables the user to perform computation on large datasets using parallelism.
We show that using molecular fingerprints we can achieve results comparable to state-of-the-art ML solutions.
arXiv Detail & Related papers (2024-03-27T19:02:09Z) - SparseChem: Fast and accurate machine learning model for small molecules [6.88204255655161]
SparseChem provides fast and accurate machine learning models for biochemical applications.
It is possible to train classification, regression and censored regression models, or combination of them from command line.
Source code and documentation is freely available under MIT License on GitHub.
arXiv Detail & Related papers (2022-03-09T12:40:35Z) - Python for Smarter Cities: Comparison of Python libraries for static and
interactive visualisations of large vector data [0.0]
Python, with its concise and natural syntax, presents a low barrier to entry for municipal staff without computer science backgrounds.
This study assesses prominent, actively-developed visualisation libraries in the Python ecosystem with respect to producing visualisations of large vector datasets.
All short-listed libraries were able to generate the sample map products for both a small and larger dataset.
arXiv Detail & Related papers (2022-02-26T10:23:29Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - Solo-learn: A Library of Self-supervised Methods for Visual
Representation Learning [83.02597612195966]
solo-learn is a library of self-supervised methods for visual representation learning.
Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs.
arXiv Detail & Related papers (2021-08-03T22:19:55Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - TorchIO: A Python library for efficient loading, preprocessing,
augmentation and patch-based sampling of medical images in deep learning [68.8204255655161]
We present TorchIO, an open-source Python library to enable efficient loading, preprocessing, augmentation and patch-based sampling of medical images for deep learning.
TorchIO follows the style of PyTorch and integrates standard medical image processing libraries to efficiently process images during training of neural networks.
It includes a command-line interface which allows users to apply transforms to image files without using Python.
arXiv Detail & Related papers (2020-03-09T13:36:16Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.