Related papers: Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging

Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging

URL: http://arxiv.org/abs/2210.05835v1
Date: Tue, 11 Oct 2022 23:33:32 GMT
Title: Synthetic Power Analyses: Empirical Evaluation and Application to Cognitive Neuroimaging
Authors: Peiye Zhuang, Bliss Chapman, Ran Li, Oluwasanmi Koyejo
Abstract summary: We propose a framework for estimating statistical power at various sample sizes. We empirically explore the performance of synthetic power analysis for sample size selection in cognitive neuroscience experiments. Our empirical results suggest that synthetic power analysis could be a low-cost alternative to pilot data collection.
Score: 14.57108653193695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the experimental sciences, statistical power analyses are often used before data collection to determine the required sample size. However, traditional power analyses can be costly when data are difficult or expensive to collect. We propose synthetic power analyses; a framework for estimating statistical power at various sample sizes, and empirically explore the performance of synthetic power analysis for sample size selection in cognitive neuroscience experiments. To this end, brain imaging data is synthesized using an implicit generative model conditioned on observed cognitive processes. Further, we propose a simple procedure to modify the statistical tests which result in conservative statistics. Our empirical results suggest that synthetic power analysis could be a low-cost alternative to pilot data collection when the proposed experiments share cognitive processes with previously conducted experiments.

Related papers

An extensive simulation study evaluating the interaction of resampling techniques across multiple causal discovery contexts [2.0946534289186842]
We present theoretical results proving that certain resampling methods emulate the assignment of specific values to algorithm tuning parameters. We also report the results of extensive simulation experiments, which verify the theoretical result and provide substantial data.
arXiv Detail & Related papers (2025-03-19T17:18:18Z)
Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
Efficient Randomized Experiments Using Foundation Models [10.606998433337894]
In this paper, we propose a novel approach that integrates the predictions from multiple foundation models while preserving valid statistical inference. Our estimator offers substantial precision gains, equivalent to a reduction of up to 20% in the sample size needed to match the same precision as the standard estimator based on experimental data alone.
arXiv Detail & Related papers (2025-02-06T17:54:10Z)
Debiasing Synthetic Data Generated by Deep Generative Models [40.165159490379146]
Deep generative models (DGMs) for synthetic data generation induce bias and imprecision in synthetic data analyses. We propose a new strategy that targets synthetic data created by DGMs for specific data analyses. Our approach accounts for biases, enhances convergence rates, and facilitates the calculation of estimators with easily approximated large sample variances.
arXiv Detail & Related papers (2024-11-06T19:24:34Z)
The Real Deal Behind the Artificial Appeal: Inferential Utility of Tabular Synthetic Data [40.165159490379146]
We show that the rate of false-positive findings (type 1 error) will be unacceptably high, even when the estimates are unbiased. Despite the use of a previously proposed correction factor, this problem persists for deep generative models.
arXiv Detail & Related papers (2023-12-13T02:04:41Z)
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data. In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Online simulator-based experimental design for cognitive model selection [74.76661199843284]
We propose BOSMOS: an approach to experimental design that can select between computational models without tractable likelihoods. In simulated experiments, we demonstrate that the proposed BOSMOS technique can accurately select models in up to 2 orders of magnitude less time than existing LFI alternatives.
arXiv Detail & Related papers (2023-03-03T21:41:01Z)
Trustworthiness of Laser-Induced Breakdown Spectroscopy Predictions via Simulation-based Synthetic Data Augmentation and Multitask Learning [4.633997895806144]
We consider quantitative analyses of spectral data using laser-induced breakdown spectroscopy. We address the small size of training data available, and the validation of the predictions during inference on unknown data.
arXiv Detail & Related papers (2022-10-07T18:00:09Z)
Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation [66.64736150040093]
Machine learning applications are becoming increasingly pervasive in our society. Risk is that they will systematically spread the bias embedded in data. We propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations.
arXiv Detail & Related papers (2022-09-13T11:18:50Z)
BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z)
A Kernelised Stein Statistic for Assessing Implicit Generative Models [10.616967871198689]
We propose a principled procedure to assess the quality of a synthetic data generator. The sample size from the synthetic data generator can be as large as desired, while the size of the observed data, which the generator aims to emulate is fixed.
arXiv Detail & Related papers (2022-05-31T23:40:21Z)
With Little Power Comes Great Responsibility [54.96675741328462]
Underpowered experiments make it more difficult to discern the difference between statistical noise and meaningful model improvements. Small test sets mean that most attempted comparisons to state of the art models will not be adequately powered. For machine translation, we find that typical test sets of 2000 sentences have approximately 75% power to detect differences of 1 BLEU point.
arXiv Detail & Related papers (2020-10-13T18:00:02Z)
Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.