Scikit-dimension: a Python package for intrinsic dimension estimation
- URL: http://arxiv.org/abs/2109.02596v1
- Date: Mon, 6 Sep 2021 16:46:38 GMT
- Title: Scikit-dimension: a Python package for intrinsic dimension estimation
- Authors: Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin and
Andrei Zinovyev
- Abstract summary: This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
- Score: 58.8599521537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dealing with uncertainty in applications of machine learning to real-life
data critically depends on the knowledge of intrinsic dimensionality (ID). A
number of methods have been suggested for the purpose of estimating ID, but no
standard package to easily apply them one by one or all at once has been
implemented in Python. This technical note introduces
\texttt{scikit-dimension}, an open-source Python package for intrinsic
dimension estimation. \texttt{scikit-dimension} package provides a uniform
implementation of most of the known ID estimators based on scikit-learn
application programming interface to evaluate global and local intrinsic
dimension, as well as generators of synthetic toy and benchmark datasets
widespread in the literature. The package is developed with tools assessing the
code quality, coverage, unit testing and continuous integration. We briefly
describe the package and demonstrate its use in a large-scale (more than 500
datasets) benchmarking of methods for ID estimation in real-life and synthetic
data. The source code is available from
https://github.com/j-bac/scikit-dimension , the documentation is available from
https://scikit-dimension.readthedocs.io .
Related papers
- RobPy: a Python Package for Robust Statistical Methods [1.2233362977312945]
RobPy offers a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn.
This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software.
arXiv Detail & Related papers (2024-11-04T10:27:30Z) - eipy: An Open-Source Python Package for Multi-modal Data Integration
using Heterogeneous Ensembles [3.465746303617158]
eipy is an open-source Python package for developing effective, multi-modal heterogeneous ensembles for classification.
eipy provides both a rigorous, and user-friendly framework for comparing and selecting the best-performing data integration and predictive modeling methods.
arXiv Detail & Related papers (2024-01-17T20:07:47Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - MISeval: a Metric Library for Medical Image Segmentation Evaluation [1.4680035572775534]
There is no universal metric library in Python for standardized and reproducible evaluation.
We propose our open-source publicly available Python package MISeval: a metric library for Medical Image Evaluation.
arXiv Detail & Related papers (2022-01-23T23:06:47Z) - QuaPy: A Python-Based Framework for Quantification [76.22817970624875]
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation)
It is written in Python and can be installed via pip.
arXiv Detail & Related papers (2021-06-18T13:57:11Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - SacreROUGE: An Open-Source Library for Using and Developing
Summarization Evaluation Metrics [74.28810048824519]
SacreROUGE is an open-source library for using and developing summarization evaluation metrics.
The library provides Python wrappers around the official implementations of existing evaluation metrics.
It provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments.
arXiv Detail & Related papers (2020-07-10T13:26:37Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.