Monte Carlo simulation studies on Python using the sstudy package with
SQL databases as storage
- URL: http://arxiv.org/abs/2004.14479v3
- Date: Mon, 20 Jul 2020 12:49:31 GMT
- Title: Monte Carlo simulation studies on Python using the sstudy package with
SQL databases as storage
- Authors: Marco H A In\'acio
- Abstract summary: sstudy is a Python package designed to simplify the preparation of simulation studies.
We present a short statistical description of the simulation study procedure with a simplified explanation of what is being estimated.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance assessment is a key issue in the process of proposing new machine
learning/statistical estimators. A possible method to complete such task is by
using simulation studies, which can be defined as the procedure of estimating
and comparing properties (such as predictive power) of estimators (and other
statistics) by averaging over many replications given a true distribution;
i.e.: generating a dataset, fitting the estimator, calculating and storing the
predictive power, and then repeating the procedure many times and finally
averaging over the stored predictive powers. Given that, in this paper, we
present sstudy: a Python package designed to simplify the preparation of
simulation studies using SQL database engines as the storage system; more
specifically, we present its basic features, usage examples and references to
the its documentation. We also present a short statistical description of the
simulation study procedure with a simplified explanation of what is being
estimated by it, as well as some examples of applications.
Related papers
- Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.
We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Simulation-based inference with the Python Package sbijax [0.7499722271664147]
sbijax is a Python package that implements a wide variety of state-of-the-art methods in neural simulation-based inference.
The package provides functionality for approximate Bayesian computation, to compute model diagnostics, and to automatically estimate summary statistics.
arXiv Detail & Related papers (2024-09-28T18:47:13Z) - Likelihood-based inference and forecasting for trawl processes: a
stochastic optimization approach [0.0]
We develop the first likelihood-based methodology for the inference of real-valued trawl processes.
We introduce novel deterministic and probabilistic forecasting methods.
We release a Python library which can be used to fit a large class of trawl processes.
arXiv Detail & Related papers (2023-08-30T15:37:48Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Efficient and Accurate In-Database Machine Learning with SQL Code
Generation in Python [0.0]
We describe a novel method for In-Database Machine Learning (IDBML) in Python using template macros in Jinja2.
Our method was 2-3% less accurate than the best current state-of-the-art methods we found (decision trees and random forests) and 2-3 times slower for one in-memory dataset.
arXiv Detail & Related papers (2021-04-07T16:23:19Z) - Efficient nonparametric statistical inference on population feature
importance using Shapley values [7.6146285961466]
We present a procedure for estimating and obtaining valid statistical inference on the Shapley Population Variable Importance Measure (SPVIM)
Although the computational complexity of the true SPVIM exponentially with the number of variables, we propose an estimator based on randomly sampling only $Theta(n)$ feature subsets given $n$ observations.
Our procedure has good finite-sample performance in simulations, and for an in-hospital prediction task produces similar variable importance estimates when different machine learning algorithms are applied.
arXiv Detail & Related papers (2020-06-16T19:47:11Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z) - CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data.
For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.