Parallel integrative learning for large-scale multi-response regression
with incomplete outcomes
- URL: http://arxiv.org/abs/2104.05076v1
- Date: Sun, 11 Apr 2021 19:01:24 GMT
- Title: Parallel integrative learning for large-scale multi-response regression
with incomplete outcomes
- Authors: Ruipeng Dong, Daoji Li, Zemin Zheng
- Abstract summary: In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction, and computation.
We propose a scalable and computationally efficient procedure, called PEER, for large-scale multi-response regression with incomplete outcomes.
Under some mild regularity conditions, we show that PEER enjoys nice sampling properties including consistency in estimation, prediction, and variable selection.
- Score: 1.7403133838762448
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-task learning is increasingly used to investigate the association
structure between multiple responses and a single set of predictor variables in
many applications. In the era of big data, the coexistence of incomplete
outcomes, large number of responses, and high dimensionality in predictors
poses unprecedented challenges in estimation, prediction, and computation. In
this paper, we propose a scalable and computationally efficient procedure,
called PEER, for large-scale multi-response regression with incomplete
outcomes, where both the numbers of responses and predictors can be
high-dimensional. Motivated by sparse factor regression, we convert the
multi-response regression into a set of univariate-response regressions, which
can be efficiently implemented in parallel. Under some mild regularity
conditions, we show that PEER enjoys nice sampling properties including
consistency in estimation, prediction, and variable selection. Extensive
simulation studies show that our proposal compares favorably with several
existing methods in estimation accuracy, variable selection, and computation
efficiency.
Related papers
- Analysing Multi-Task Regression via Random Matrix Theory with Application to Time Series Forecasting [16.640336442849282]
We formulate a multi-task optimization problem as a regularization technique to enable single-task models to leverage multi-task learning information.
We derive a closed-form solution for multi-task optimization in the context of linear models.
arXiv Detail & Related papers (2024-06-14T17:59:25Z) - Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation [14.194212772887699]
We consider meta-learning within the framework of high-dimensional random-effects linear models.
We show the precise behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task.
We propose and analyze an estimator inverse random regression coefficients based on data from the training tasks.
arXiv Detail & Related papers (2024-03-27T21:18:43Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Errors-in-variables Fr\'echet Regression with Low-rank Covariate
Approximation [2.1756081703276]
Fr'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables.
Our proposed framework combines the concepts of global Fr'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator.
arXiv Detail & Related papers (2023-05-16T08:37:54Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Consensual Aggregation on Random Projected High-dimensional Features for
Regression [0.0]
We present a study of a kernel-based consensual aggregation on randomly projected high-dimensional features of predictions for regression.
We numerically illustrate that the aggregation scheme upholds its performance on very large and highly correlated features.
The efficiency of the proposed method is illustrated through several experiments evaluated on different types of synthetic and real datasets.
arXiv Detail & Related papers (2022-04-06T06:35:47Z) - Machine Learning for Multi-Output Regression: When should a holistic
multivariate approach be preferred over separate univariate ones? [62.997667081978825]
Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods.
We compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.
arXiv Detail & Related papers (2022-01-14T08:44:25Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Fast cross-validation for multi-penalty ridge regression [0.0]
Ridge regression is a simple model for high-dimensional data.
Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix.
Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems.
arXiv Detail & Related papers (2020-05-19T09:13:43Z) - Towards Multimodal Response Generation with Exemplar Augmentation and
Curriculum Optimization [73.45742420178196]
We propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization.
Our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
arXiv Detail & Related papers (2020-04-26T16:29:06Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.