RobPy: a Python Package for Robust Statistical Methods
- URL: http://arxiv.org/abs/2411.01954v1
- Date: Mon, 04 Nov 2024 10:27:30 GMT
- Title: RobPy: a Python Package for Robust Statistical Methods
- Authors: Sarah Leyder, Jakob Raymaekers, Peter J. Rousseeuw, Thomas Servotte, Tim Verdonck,
- Abstract summary: RobPy offers a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn.
This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software.
- Score: 1.2233362977312945
- License:
- Abstract: Robust estimation provides essential tools for analyzing data that contain outliers, ensuring that statistical models remain reliable even in the presence of some anomalous data. While robust methods have long been available in R, users of Python have lacked a comprehensive package that offers these methods in a cohesive framework. RobPy addresses this gap by offering a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn. This package includes tools for robust preprocessing, univariate estimation, covariance matrices, regression, and principal component analysis, which are able to detect outliers and to mitigate their effect. In addition, RobPy provides specialized diagnostic plots for visualizing casewise and cellwise outliers. This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software. By bringing robust methods to Python, RobPy enables more users to perform robust data analysis in a modern and versatile programming language.
Related papers
- A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning [42.350737545269105]
We show how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.
arXiv Detail & Related papers (2024-07-19T23:01:48Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - DeeProb-kit: a Python Library for Deep Probabilistic Modelling [0.0]
DeeProb-kit is a unified library written in Python consisting of a collection of deep probabilistic models (DPMs)
It includes efficiently implemented learning techniques, inference routines, statistical algorithms, and provides high-quality fully-documented APIs.
arXiv Detail & Related papers (2022-12-08T17:02:16Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - Latte: Cross-framework Python Package for Evaluation of Latent-Based
Generative Models [65.51757376525798]
Latte is a Python library for evaluation of latent-based generative models.
Latte is compatible with both PyTorch and/Keras, and provides both functional and modular APIs.
arXiv Detail & Related papers (2021-12-20T16:00:28Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - QuaPy: A Python-Based Framework for Quantification [76.22817970624875]
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation)
It is written in Python and can be installed via pip.
arXiv Detail & Related papers (2021-06-18T13:57:11Z) - PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data.
The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats.
The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z) - Landscape of R packages for eXplainable Artificial Intelligence [4.91155110560629]
The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.
arXiv Detail & Related papers (2020-09-24T16:54:57Z) - Picasso: A Sparse Learning Library for High Dimensional Data Analysis in
R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems.
The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.