Mining the Factor Zoo: Estimation of Latent Factor Models with
Sufficient Proxies
- URL: http://arxiv.org/abs/2212.12845v1
- Date: Sun, 25 Dec 2022 03:10:44 GMT
- Title: Mining the Factor Zoo: Estimation of Latent Factor Models with
Sufficient Proxies
- Authors: Runzhe Wan, Yingying Li, Wenbin Lu and Rui Song
- Abstract summary: We propose to bridge the two approaches to latent factor model estimation.
We make the latent factor model estimation robust, flexible, and statistically more accurate.
As a bonus, the number of factors is also allowed to grow.
- Score: 29.737081616352913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent factor model estimation typically relies on either using domain
knowledge to manually pick several observed covariates as factor proxies, or
purely conducting multivariate analysis such as principal component analysis.
However, the former approach may suffer from the bias while the latter can not
incorporate additional information. We propose to bridge these two approaches
while allowing the number of factor proxies to diverge, and hence make the
latent factor model estimation robust, flexible, and statistically more
accurate. As a bonus, the number of factors is also allowed to grow. At the
heart of our method is a penalized reduced rank regression to combine
information. To further deal with heavy-tailed data, a computationally
attractive penalized robust reduced rank regression method is proposed. We
establish faster rates of convergence compared with the benchmark. Extensive
simulations and real examples are used to illustrate the advantages.
Related papers
- Linked shrinkage to improve estimation of interaction effects in
regression models [0.0]
We develop an estimator that adapts well to two-way interaction terms in a regression model.
We evaluate the potential of the model for inference, which is notoriously hard for selection strategies.
Our models can be very competitive to a more advanced machine learner, like random forest, even for fairly large sample sizes.
arXiv Detail & Related papers (2023-09-25T10:03:39Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Bayesian Hierarchical Models for Counterfactual Estimation [12.159830463756341]
We propose a probabilistic paradigm to estimate a diverse set of counterfactuals.
We treat the perturbations as random variables endowed with prior distribution functions.
A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples.
arXiv Detail & Related papers (2023-01-21T00:21:11Z) - Data Augmentation in the Underparameterized and Overparameterized
Regimes [7.326504492614808]
We quantify how data augmentation affects the variance and limiting distribution of estimates.
The results confirm some observations made in machine learning practice, but also lead to unexpected findings.
arXiv Detail & Related papers (2022-02-18T11:32:41Z) - Benign-Overfitting in Conditional Average Treatment Effect Prediction
with Linear Regression [14.493176427999028]
We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE) with linear regression models.
We show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known.
arXiv Detail & Related papers (2022-02-10T18:51:52Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Multivariate Probabilistic Regression with Natural Gradient Boosting [63.58097881421937]
We propose a Natural Gradient Boosting (NGBoost) approach based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution.
Our method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.
arXiv Detail & Related papers (2021-06-07T17:44:49Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z) - DEMI: Discriminative Estimator of Mutual Information [5.248805627195347]
Estimating mutual information between continuous random variables is often intractable and challenging for high-dimensional data.
Recent progress has leveraged neural networks to optimize variational lower bounds on mutual information.
Our approach is based on training a classifier that provides the probability that a data sample pair is drawn from the joint distribution.
arXiv Detail & Related papers (2020-10-05T04:19:27Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.