Related papers: Monte Carlo simulation studies on Python using the sstudy package with SQL databases as storage

Monte Carlo simulation studies on Python using the sstudy package with SQL databases as storage

URL: http://arxiv.org/abs/2004.14479v3
Date: Mon, 20 Jul 2020 12:49:31 GMT
Title: Monte Carlo simulation studies on Python using the sstudy package with SQL databases as storage
Authors: Marco H A In\'acio
Abstract summary: sstudy is a Python package designed to simplify the preparation of simulation studies. We present a short statistical description of the simulation study procedure with a simplified explanation of what is being estimated.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Performance assessment is a key issue in the process of proposing new machine learning/statistical estimators. A possible method to complete such task is by using simulation studies, which can be defined as the procedure of estimating and comparing properties (such as predictive power) of estimators (and other statistics) by averaging over many replications given a true distribution; i.e.: generating a dataset, fitting the estimator, calculating and storing the predictive power, and then repeating the procedure many times and finally averaging over the stored predictive powers. Given that, in this paper, we present sstudy: a Python package designed to simplify the preparation of simulation studies using SQL database engines as the storage system; more specifically, we present its basic features, usage examples and references to the its documentation. We also present a short statistical description of the simulation study procedure with a simplified explanation of what is being estimated by it, as well as some examples of applications.

Related papers

Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Simulation-based inference with the Python Package sbijax [0.7499722271664147]
sbijax is a Python package that implements a wide variety of state-of-the-art methods in neural simulation-based inference. The package provides functionality for approximate Bayesian computation, to compute model diagnostics, and to automatically estimate summary statistics.
arXiv Detail & Related papers (2024-09-28T18:47:13Z)
Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z)
Likelihood-based inference and forecasting for trawl processes: a stochastic optimization approach [0.0]
We develop the first likelihood-based methodology for the inference of real-valued trawl processes. We introduce novel deterministic and probabilistic forecasting methods. We release a Python library which can be used to fit a large class of trawl processes.
arXiv Detail & Related papers (2023-08-30T15:37:48Z)
Learning to be a Statistician: Learned Estimator for Number of Distinct Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples. We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Efficient and Accurate In-Database Machine Learning with SQL Code Generation in Python [0.0]
We describe a novel method for In-Database Machine Learning (IDBML) in Python using template macros in Jinja2. Our method was 2-3% less accurate than the best current state-of-the-art methods we found (decision trees and random forests) and 2-3 times slower for one in-memory dataset.
arXiv Detail & Related papers (2021-04-07T16:23:19Z)
Efficient nonparametric statistical inference on population feature importance using Shapley values [7.6146285961466]
We present a procedure for estimating and obtaining valid statistical inference on the Shapley Population Variable Importance Measure (SPVIM) Although the computational complexity of the true SPVIM exponentially with the number of variables, we propose an estimator based on randomly sampling only $Theta(n)$ feature subsets given $n$ observations. Our procedure has good finite-sample performance in simulations, and for an in-hospital prediction task produces similar variable importance estimates when different machine learning algorithms are applied.
arXiv Detail & Related papers (2020-06-16T19:47:11Z)
Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression. Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice. A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.