Inference for Large Scale Regression Models with Dependent Errors
- URL: http://arxiv.org/abs/2409.05160v1
- Date: Sun, 8 Sep 2024 17:01:05 GMT
- Title: Inference for Large Scale Regression Models with Dependent Errors
- Authors: Lionel Voirol, Haotian Xu, Yuming Zhang, Luca Insolia, Roberto Molinari, Stéphane Guerrier,
- Abstract summary: This work defines and proves the statistical properties of the Generalized Method of Wavelet Moments with Exogenous variables (GMWMX)
It is a highly scalable, stable, and statistically valid method for estimating and delivering inference for linear models using processes in the presence of data complexities like latent dependence structures and missing data.
- Score: 3.3160726548489015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The exponential growth in data sizes and storage costs has brought considerable challenges to the data science community, requiring solutions to run learning methods on such data. While machine learning has scaled to achieve predictive accuracy in big data settings, statistical inference and uncertainty quantification tools are still lagging. Priority scientific fields collect vast data to understand phenomena typically studied with statistical methods like regression. In this setting, regression parameter estimation can benefit from efficient computational procedures, but the main challenge lies in computing error process parameters with complex covariance structures. Identifying and estimating these structures is essential for inference and often used for uncertainty quantification in machine learning with Gaussian Processes. However, estimating these structures becomes burdensome as data scales, requiring approximations that compromise the reliability of outputs. These approximations are even more unreliable when complexities like long-range dependencies or missing data are present. This work defines and proves the statistical properties of the Generalized Method of Wavelet Moments with Exogenous variables (GMWMX), a highly scalable, stable, and statistically valid method for estimating and delivering inference for linear models using stochastic processes in the presence of data complexities like latent dependence structures and missing data. Applied examples from Earth Sciences and extensive simulations highlight the advantages of the GMWMX.
Related papers
- Discovering Predictable Latent Factors for Time Series Forecasting [39.08011991308137]
We develop a novel framework for inferring the intrinsic latent factors implied by the observable time series.
We introduce three characteristics, i.e., predictability, sufficiency, and identifiability, and model these characteristics via the powerful deep latent dynamics models.
Empirical results on multiple real datasets show the efficiency of our method for different kinds of time series forecasting.
arXiv Detail & Related papers (2023-03-18T14:37:37Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - A Causality-Based Learning Approach for Discovering the Underlying
Dynamics of Complex Systems from Partial Observations with Stochastic
Parameterization [1.2882319878552302]
This paper develops a new iterative learning algorithm for complex turbulent systems with partial observations.
It alternates between identifying model structures, recovering unobserved variables, and estimating parameters.
Numerical experiments show that the new algorithm succeeds in identifying the model structure and providing suitable parameterizations for many complex nonlinear systems.
arXiv Detail & Related papers (2022-08-19T00:35:03Z) - A Data-Driven Method for Automated Data Superposition with Applications
in Soft Matter Science [0.0]
We develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations.
Our method produces interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
arXiv Detail & Related papers (2022-04-20T14:58:04Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Learning Stable Nonparametric Dynamical Systems with Gaussian Process
Regression [9.126353101382607]
We learn a nonparametric Lyapunov function based on Gaussian process regression from data.
We prove that stabilization of the nominal model based on the nonparametric control Lyapunov function does not modify the behavior of the nominal model at training samples.
arXiv Detail & Related papers (2020-06-14T11:17:17Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Designing Accurate Emulators for Scientific Processes using
Calibration-Driven Deep Models [33.935755695805724]
Learn-by-Calibrating (LbC) is a novel deep learning approach for designing emulators in scientific applications.
We show that LbC provides significant improvements in generalization error over widely-adopted loss function choices.
LbC achieves high-quality emulators even in small data regimes and more importantly, recovers the inherent noise structure without any explicit priors.
arXiv Detail & Related papers (2020-05-05T16:54:11Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.