Inference on the change point in high dimensional time series models via
plug in least squares
- URL: http://arxiv.org/abs/2007.01888v3
- Date: Sat, 11 Jul 2020 05:49:14 GMT
- Title: Inference on the change point in high dimensional time series models via
plug in least squares
- Authors: Abhishek Kaul, Stergios B. Fotopoulos, Venkata K. Jandhyala, Abolfazl
Safikhani
- Abstract summary: We study a plug in least squares estimator for the change point parameter where change is in the mean of a high dimensional random vector.
We obtain sufficient conditions under which this estimator possesses sufficient adaptivity against plug in estimates of mean parameters.
- Score: 2.7718973516070684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a plug in least squares estimator for the change point parameter
where change is in the mean of a high dimensional random vector under
subgaussian or subexponential distributions. We obtain sufficient conditions
under which this estimator possesses sufficient adaptivity against plug in
estimates of mean parameters in order to yield an optimal rate of convergence
$O_p(\xi^{-2})$ in the integer scale. This rate is preserved while allowing
high dimensionality as well as a potentially diminishing jump size $\xi,$
provided $s\log (p\vee T)=o(\surd(Tl_T))$ or $s\log^{3/2}(p\vee
T)=o(\surd(Tl_T))$ in the subgaussian and subexponential cases, respectively.
Here $s,p,T$ and $l_T$ represent a sparsity parameter, model dimension,
sampling period and the separation of the change point from its parametric
boundary. Moreover, since the rate of convergence is free of $s,p$ and
logarithmic terms of $T,$ it allows the existence of limiting distributions.
These distributions are then derived as the {\it argmax} of a two sided
negative drift Brownian motion or a two sided negative drift random walk under
vanishing and non-vanishing jump size regimes, respectively. Thereby allowing
inference of the change point parameter in the high dimensional setting.
Feasible algorithms for implementation of the proposed methodology are
provided. Theoretical results are supported with monte-carlo simulations.
Related papers
- Variable Selection in Convex Piecewise Linear Regression [5.366354612549172]
This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression.
A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
arXiv Detail & Related papers (2024-11-04T16:19:09Z) - Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases.
A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products [12.047053875716506]
We consider the problem of sufficient dimension reduction for multi-index models.
We show that a fast parametric convergence rate of form $C_d cdot n-1/2$ is achievable.
arXiv Detail & Related papers (2023-12-24T12:28:07Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.