Semi-supervised learning for linear extremile regression
- URL: http://arxiv.org/abs/2507.01314v1
- Date: Wed, 02 Jul 2025 02:59:15 GMT
- Title: Semi-supervised learning for linear extremile regression
- Authors: Rong Jiang, Keming Yu, Jiangfeng Wang,
- Abstract summary: This paper introduces a novel definition of linear extremile regression along with an accompanying estimation methodology.<n>The regression coefficient estimators of this method achieve $sqrtn$-consistency, which nonparametric extremile regression may not provide.<n>We propose a semi-supervised learning approach to enhance estimation efficiency, even when the specified linear extremile regression model may be misspecified.
- Score: 0.8973184739267972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extremile regression, as a least squares analog of quantile regression, is potentially useful tool for modeling and understanding the extreme tails of a distribution. However, existing extremile regression methods, as nonparametric approaches, may face challenges in high-dimensional settings due to data sparsity, computational inefficiency, and the risk of overfitting. While linear regression serves as the foundation for many other statistical and machine learning models due to its simplicity, interpretability, and relatively easy implementation, particularly in high-dimensional settings, this paper introduces a novel definition of linear extremile regression along with an accompanying estimation methodology. The regression coefficient estimators of this method achieve $\sqrt{n}$-consistency, which nonparametric extremile regression may not provide. In particular, while semi-supervised learning can leverage unlabeled data to make more accurate predictions and avoid overfitting to small labeled datasets in high-dimensional spaces, we propose a semi-supervised learning approach to enhance estimation efficiency, even when the specified linear extremile regression model may be misspecified. Both simulation studies and real data analyses demonstrate the finite-sample performance of our proposed methods.
Related papers
- A Simplified Analysis of SGD for Linear Regression with Weight Averaging [64.2393952273612]
Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
arXiv Detail & Related papers (2025-06-18T15:10:38Z) - RieszBoost: Gradient Boosting for Riesz Regression [49.737777802061984]
We propose a novel gradient boosting algorithm to directly estimate the Riesz representer without requiring its explicit analytical form.<n>We show that our algorithm performs on par with or better than indirect estimation techniques across a range of functionals.
arXiv Detail & Related papers (2025-01-08T23:04:32Z) - Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness [10.470114319701576]
We introduce a model-free debiasing method for smooth nonparametric regression estimators.<n>We obtain a debiased estimator that satisfies pointwise and uniform risk convergence, along with smoothness, under mild conditions.
arXiv Detail & Related papers (2024-12-28T15:01:19Z) - Progression: an extrapolation principle for regression [0.0]
We propose a novel statistical extrapolation principle.
It assumes a simple relationship between predictors and the response at the boundary of the training predictor samples.
Our semi-parametric method, progression, leverages this extrapolation principle and offers guarantees on the approximation error beyond the training data range.
arXiv Detail & Related papers (2024-10-30T17:29:51Z) - Transfer Learning for Nonparametric Regression: Non-asymptotic Minimax
Analysis and Adaptive Procedure [5.303044915173525]
We develop a novel estimator called the confidence thresholding estimator, which is shown to achieve the minimax optimal risk up to a logarithmic factor.
We then propose a data-driven algorithm that adaptively achieves the minimax risk up to a logarithmic factor across a wide range of parameter spaces.
arXiv Detail & Related papers (2024-01-22T16:24:04Z) - Engression: Extrapolation through the Lens of Distributional Regression [2.519266955671697]
We propose a neural network-based distributional regression methodology called engression'
An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes.
We show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions.
arXiv Detail & Related papers (2023-07-03T08:19:00Z) - Errors-in-variables Fr\'echet Regression with Low-rank Covariate
Approximation [2.1756081703276]
Fr'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables.
Our proposed framework combines the concepts of global Fr'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator.
arXiv Detail & Related papers (2023-05-16T08:37:54Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Prediction Intervals and Confidence Regions for Symbolic Regression
Models based on Likelihood Profiles [0.0]
Quantification of uncertainty of regression models is important for the interpretation of models and for decision making.
The linear approximation and so-called likelihood profiles are well-known possibilities for the calculation of confidence and prediction intervals.
These simple and effective techniques have been completely ignored so far in the genetic programming literature.
arXiv Detail & Related papers (2022-09-14T07:07:55Z) - Benign overfitting and adaptive nonparametric regression [71.70323672531606]
We construct an estimator which is a continuous function interpolating the data points with high probability.
We attain minimax optimal rates under mean squared risk on the scale of H"older classes adaptively to the unknown smoothness.
arXiv Detail & Related papers (2022-06-27T14:50:14Z) - Time varying regression with hidden linear dynamics [74.9914602730208]
We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system.
Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates.
arXiv Detail & Related papers (2021-12-29T23:37:06Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Simple Imputation Rules for Prediction with Missing Data: Contrasting
Theoretical Guarantees with Empirical Performance [7.642646077340124]
Missing data is a common issue in real-world datasets.
This paper studies the performance of impute-then-regress pipelines by contrasting theoretical and empirical evidence.
arXiv Detail & Related papers (2021-04-07T14:45:14Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.