Related papers: RobPy: a Python Package for Robust Statistical Methods

RobPy: a Python Package for Robust Statistical Methods

URL: http://arxiv.org/abs/2411.01954v1
Date: Mon, 04 Nov 2024 10:27:30 GMT
Title: RobPy: a Python Package for Robust Statistical Methods
Authors: Sarah Leyder, Jakob Raymaekers, Peter J. Rousseeuw, Thomas Servotte, Tim Verdonck,
Abstract summary: RobPy offers a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn. This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software.
Score: 1.2233362977312945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robust estimation provides essential tools for analyzing data that contain outliers, ensuring that statistical models remain reliable even in the presence of some anomalous data. While robust methods have long been available in R, users of Python have lacked a comprehensive package that offers these methods in a cohesive framework. RobPy addresses this gap by offering a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn. This package includes tools for robust preprocessing, univariate estimation, covariance matrices, regression, and principal component analysis, which are able to detect outliers and to mitigate their effect. In addition, RobPy provides specialized diagnostic plots for visualizing casewise and cellwise outliers. This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software. By bringing robust methods to Python, RobPy enables more users to perform robust data analysis in a modern and versatile programming language.

Related papers

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python [0.6562256987706128]
shapr is a versatile tool for generating Shapley value explanations for machine learning and statistical regression models in both R and Python. We introduce the shaprpy Python library, which brings core capabilities of shapr to the Python ecosystem.
arXiv Detail & Related papers (2025-04-02T15:47:30Z)
PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers. We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z)
A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning [42.350737545269105]
We show how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.
arXiv Detail & Related papers (2024-07-19T23:01:48Z)
PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series. It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z)
DeeProb-kit: a Python Library for Deep Probabilistic Modelling [0.0]
DeeProb-kit is a unified library written in Python consisting of a collection of deep probabilistic models (DPMs) It includes efficiently implemented learning techniques, inference routines, statistical algorithms, and provides high-quality fully-documented APIs.
arXiv Detail & Related papers (2022-12-08T17:02:16Z)
DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z)
Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models [65.51757376525798]
Latte is a Python library for evaluation of latent-based generative models. Latte is compatible with both PyTorch and/Keras, and provides both functional and modular APIs.
arXiv Detail & Related papers (2021-12-20T16:00:28Z)
Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation. textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z)
QuaPy: A Python-Based Framework for Quantification [76.22817970624875]
QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation) It is written in Python and can be installed via pip.
arXiv Detail & Related papers (2021-06-18T13:57:11Z)
PyHealth: A Python Library for Health Predictive Models [53.848478115284195]
PyHealth is an open-source Python toolbox for developing various predictive models on healthcare data. The data preprocessing module enables the transformation of complex healthcare datasets into machine learning friendly formats. The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches.
arXiv Detail & Related papers (2021-01-11T22:02:08Z)
Landscape of R packages for eXplainable Artificial Intelligence [4.91155110560629]
The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.
arXiv Detail & Related papers (2020-09-24T16:54:57Z)
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems. The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.