Machine Learning using Stata/Python
- URL: http://arxiv.org/abs/2103.03122v1
- Date: Wed, 3 Mar 2021 10:31:44 GMT
- Title: Machine Learning using Stata/Python
- Authors: Giovanni Cerulli
- Abstract summary: We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting popular Machine Learning (ML) methods.
R_ml_stata and c_ml_stata use the Python Scikit-learn API to carry out both cross-validation and outcome/label prediction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting
popular Machine Learning (ML) methods both in regression and classification
settings. Using the recent Stata/Python integration platform (sfi) of Stata 16,
these commands provide hyper-parameters' optimal tuning via K-fold
cross-validation using greed search. More specifically, they make use of the
Python Scikit-learn API to carry out both cross-validation and outcome/label
prediction.
Related papers
- TorchCP: A Library for Conformal Prediction based on PyTorch [9.295285907724672]
TorchCP is a Python toolbox for conformal prediction research on deep learning models.
It contains various implementations for posthoc and training methods for classification and regression tasks.
arXiv Detail & Related papers (2024-02-20T03:14:47Z) - LCE: An Augmented Combination of Bagging and Boosting in Python [45.65284933207566]
lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression.
Local Cascade Ensemble (LCE) is a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost.
arXiv Detail & Related papers (2023-08-14T16:34:47Z) - pystacked: Stacking generalization and machine learning in Stata [0.0]
pystacked implements stacked generalization via Python's scikit-lear.
Stacking combines multiple supervised machine learners into a single learner.
pystacked provides an easy-to-use API for scikit-learn's machine learning algorithms.
arXiv Detail & Related papers (2022-08-23T12:03:04Z) - MAPIE: an open-source library for distribution-free uncertainty
quantification [0.0]
We introduce MAPIE, an open-source Python library that quantifies the uncertainties of Machine Learning models.
MAPIE implements conformgnostical prediction methods, allowing the user to easily compute uncertainties.
It is hosted on scikit-learn-contrib and is fully "scikit-learn-compatible"
arXiv Detail & Related papers (2022-07-25T15:44:19Z) - Stochastic Gradient Descent without Full Data Shuffle [65.97105896033815]
CorgiPile is a hierarchical data shuffling strategy that avoids a full data shuffle while maintaining comparable convergence rate of SGD as if a full shuffle were performed.
Our results show that CorgiPile can achieve comparable convergence rate with the full shuffle based SGD for both deep learning and generalized linear models.
arXiv Detail & Related papers (2022-06-12T20:04:31Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs)
PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training.
PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z) - DoubleML -- An Object-Oriented Implementation of Double Machine Learning
in Python [1.4911092205861822]
DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al.
It contains functionalities for valid statistical inference on causal parameters when the estimation of parameters is based on machine learning methods.
The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem.
arXiv Detail & Related papers (2021-04-07T16:16:39Z) - MOGPTK: The Multi-Output Gaussian Process Toolkit [71.08576457371433]
We present MOGPTK, a Python package for multi-channel data modelling using Gaussian processes (GP)
The aim of this toolkit is to make multi-output GP (MOGP) models accessible to researchers, data scientists, and practitioners alike.
arXiv Detail & Related papers (2020-02-09T23:34:49Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.