A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor
Analysis
- URL: http://arxiv.org/abs/2001.07859v4
- Date: Thu, 4 Feb 2021 17:29:22 GMT
- Title: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor
Analysis
- Authors: Christopher J. Urban and Daniel J. Bauer
- Abstract summary: We investigate a deep learning-based VI algorithm for exploratory item factor analysis (IFA) that is computationally fast even in large data sets with many latent factors.
The proposed approach applies a deep artificial neural network model called an importance-weighted autoencoder (IWAE) for exploratory IFA.
We show that the IWAE yields more accurate estimates as either the sample size or the number of IW samples increases.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Marginal maximum likelihood (MML) estimation is the preferred approach to
fitting item response theory models in psychometrics due to the MML estimator's
consistency, normality, and efficiency as the sample size tends to infinity.
However, state-of-the-art MML estimation procedures such as the
Metropolis-Hastings Robbins-Monro (MH-RM) algorithm as well as approximate MML
estimation procedures such as variational inference (VI) are computationally
time-consuming when the sample size and the number of latent factors are very
large. In this work, we investigate a deep learning-based VI algorithm for
exploratory item factor analysis (IFA) that is computationally fast even in
large data sets with many latent factors. The proposed approach applies a deep
artificial neural network model called an importance-weighted autoencoder
(IWAE) for exploratory IFA. The IWAE approximates the MML estimator using an
importance sampling technique wherein increasing the number of
importance-weighted (IW) samples drawn during fitting improves the
approximation, typically at the cost of decreased computational efficiency. We
provide a real data application that recovers results aligning with
psychological theory across random starts. Via simulation studies, we show that
the IWAE yields more accurate estimates as either the sample size or the number
of IW samples increases (although factor correlation and intercepts estimates
exhibit some bias) and obtains similar results to MH-RM in less time. Our
simulations also suggest that the proposed approach performs similarly to and
is potentially faster than constrained joint maximum likelihood estimation, a
fast procedure that is consistent when the sample size and the number of items
simultaneously tend to infinity.
Related papers
- Measuring Variable Importance in Individual Treatment Effect Estimation with High Dimensional Data [35.104681814241104]
Causal machine learning (ML) promises to provide powerful tools for estimating individual treatment effects.
ML methods still face the significant challenge of interpretability, which is crucial for medical applications.
We propose a new algorithm based on the Conditional Permutation Importance (CPI) method for statistically rigorous variable importance assessment.
arXiv Detail & Related papers (2024-08-23T11:44:07Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm [41.25603565852633]
This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs.
The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation.
A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time.
arXiv Detail & Related papers (2023-10-18T19:34:56Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Optimally-Weighted Estimators of the Maximum Mean Discrepancy for
Likelihood-Free Inference [12.157511906467146]
Likelihood-free inference methods typically make use of a distance between simulated and real data.
The maximum mean discrepancy (MMD) is commonly estimated at a root-$m$ rate, where $m$ is the number of simulated samples.
We propose a novel estimator for the MMD with significantly improved sample complexity.
arXiv Detail & Related papers (2023-01-27T12:13:54Z) - Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data [0.0]
This research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy.
Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS.
arXiv Detail & Related papers (2022-11-28T21:00:55Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Localized Debiased Machine Learning: Efficient Inference on Quantile
Treatment Effects and Beyond [69.83813153444115]
We consider an efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference.
Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances.
We propose localized debiased machine learning (LDML), which avoids this burdensome step.
arXiv Detail & Related papers (2019-12-30T14:42:52Z) - Scalable Influence Estimation Without Sampling [9.873635079670091]
In a diffusion process on a network, how many nodes are expected to be influenced by a set of initial spreaders?
Here, we suggest an algorithm for estimating the influence function in popular independent model based on a scalable dynamic message-passing approach.
We also provide dynamic message-passing equations for a version of the linear threshold model.
arXiv Detail & Related papers (2019-12-29T22:15:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.