Finite-Sample Inference for Sparsely Permuted Linear Regression
- URL: http://arxiv.org/abs/2601.14872v2
- Date: Thu, 22 Jan 2026 03:42:11 GMT
- Title: Finite-Sample Inference for Sparsely Permuted Linear Regression
- Authors: Hirofumi Ota, Masaaki Imaizumi,
- Abstract summary: We develop a general statistical inference framework on the permutation and regression coefficients.<n>For computational purposes, we develop a linear assignment problem computable in time and demonstrate that, with high probability, the solution is equivalent to that of the conventional least squares with large computational cost.
- Score: 6.2000582635449994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a linear observation model with an unknown permutation called \textit{permuted/shuffled linear regression}, where responses and covariates are mismatched and the permutation forms a discrete, factorial-size parameter. The permutation is a key component of the data-generating process, yet its statistical investigation remains challenging due to its discrete nature. We develop a general statistical inference framework on the permutation and regression coefficients. First, we introduce a localization step that reduces the permutation space to a small candidate set building on recent advances in the repro samples method, whose miscoverage decays polynomially with the number of Monte Carlo samples. Then, based on this localized set, we provide statistical inference procedures: a conditional Monte Carlo test of permutation structures with valid finite-sample Type-I error control. We also develop coefficient inference that remains valid under alignment uncertainty of permutations. For computational purposes, we develop a linear assignment problem computable in polynomial time and demonstrate that, with high probability, the solution is equivalent to that of the conventional least squares with large computational cost. Extensions to partially permuted designs and ridge regularization are further discussed. Extensive simulations and an application to air-quality data corroborate finite-sample validity, strong power to detect mismatches, and practical scalability.
Related papers
- Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Generalised Bayesian Inference for Discrete Intractable Likelihood [9.331721990371769]
This paper develops a novel generalised Bayesian inference procedure suitable for discrete intractable likelihood.
Inspired by recent methodological advances for continuous data, the main idea is to update beliefs about model parameters using a discrete Fisher divergence.
The result is a generalised posterior that can be sampled from using standard computational tools, such as Markov Monte Carlo.
arXiv Detail & Related papers (2022-06-16T19:36:17Z) - Sampling Approximately Low-Rank Ising Models: MCMC meets Variational
Methods [35.24886589614034]
We consider quadratic definite Ising models on the hypercube with a general interaction $J$.
Our general result implies the first time sampling algorithms for low-rank Ising models.
arXiv Detail & Related papers (2022-02-17T21:43:50Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - A non-asymptotic model selection in block-diagonal mixture of polynomial
experts models [1.491109220586182]
We introduce a penalized maximum likelihood selection criterion to estimate the unknown conditional density of a regression model.
We provide a strong theoretical guarantee, including a finite-sample oracle satisfied by the penalized maximum likelihood with a Jensen-Kullback-Leibler type loss.
arXiv Detail & Related papers (2021-04-18T21:32:20Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples.
The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.