Machine Learning Panel Data Regressions with Heavy-tailed Dependent
Data: Theory and Application
- URL: http://arxiv.org/abs/2008.03600v2
- Date: Mon, 22 Nov 2021 15:35:00 GMT
- Title: Machine Learning Panel Data Regressions with Heavy-tailed Dependent
Data: Theory and Application
- Authors: Andrii Babii and Ryan T. Ball and Eric Ghysels and Jonas Striaukas
- Abstract summary: The paper introduces structured machine learning regressions for heavy-tailed dependent panel data potentially sampled at different frequencies.
We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data can have fat tails.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper introduces structured machine learning regressions for heavy-tailed
dependent panel data potentially sampled at different frequencies. We focus on
the sparse-group LASSO regularization. This type of regularization can take
advantage of the mixed frequency time series panel data structures and improve
the quality of the estimates. We obtain oracle inequalities for the pooled and
fixed effects sparse-group LASSO panel data estimators recognizing that
financial and economic data can have fat tails. To that end, we leverage on a
new Fuk-Nagaev concentration inequality for panel data consisting of
heavy-tailed $\tau$-mixing processes.
Related papers
- Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.
Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.
Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z) - The Data Addition Dilemma [4.869513274920574]
In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources.
But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings?
We identify this situation as the textitData Addition Dilemma, demonstrating that adding training data in this multi-source scaling context can at times result in reduced overall accuracy, uncertain fairness outcomes, and reduced worst-subgroup performance.
arXiv Detail & Related papers (2024-08-08T01:42:31Z) - Panel Data Nowcasting: The Case of Price-Earnings Ratios [0.0]
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies.
Motivated by the problem of predicting corporate earnings for a large cross-section of firms, we focus on the sparse-group LASSO regularization.
arXiv Detail & Related papers (2023-07-05T22:04:46Z) - Temperature Schedules for Self-Supervised Contrastive Methods on
Long-Tail Data [87.77128754860983]
In this paper, we analyse the behaviour of one of the most popular variants of self-supervised learning (SSL) on long-tail data.
We find that a large $tau$ emphasises group-wise discrimination, whereas a small $tau$ leads to a higher degree of instance discrimination.
We propose to employ a dynamic $tau$ and show that a simple cosine schedule can yield significant improvements in the learnt representations.
arXiv Detail & Related papers (2023-03-23T20:37:25Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z) - Class Balancing GAN with a Classifier in the Loop [58.29090045399214]
We introduce a novel theoretically motivated Class Balancing regularizer for training GANs.
Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset.
We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets.
arXiv Detail & Related papers (2021-06-17T11:41:30Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Time-Series Imputation with Wasserstein Interpolation for Optimal
Look-Ahead-Bias and Variance Tradeoff [66.59869239999459]
In finance, imputation of missing returns may be applied prior to training a portfolio optimization model.
There is an inherent trade-off between the look-ahead-bias of using the full data set for imputation and the larger variance in the imputation from using only the training data.
We propose a Bayesian posterior consensus distribution which optimally controls the variance and look-ahead-bias trade-off in the imputation.
arXiv Detail & Related papers (2021-02-25T09:05:35Z) - Fairness in Forecasting and Learning Linear Dynamical Systems [10.762748665074794]
We introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems.
In particular, we consider the subgroup-fair and instant-fair learning of a linear dynamical system from multiple trajectories of varying lengths.
arXiv Detail & Related papers (2020-06-12T16:53:27Z) - Machine Learning Time Series Regressions with an Application to
Nowcasting [0.0]
This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies.
The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO.
arXiv Detail & Related papers (2020-05-28T14:42:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.