Related papers: Multivariate Trend Filtering for Lattice Data

Multivariate Trend Filtering for Lattice Data

URL: http://arxiv.org/abs/2112.14758v2
Date: Fri, 5 Apr 2024 18:27:12 GMT
Title: Multivariate Trend Filtering for Lattice Data
Authors: Veeranjaneyulu Sadhanala, Yu-Xiang Wang, Addison J. Hu, Ryan J. Tibshirani,
Abstract summary: We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. We develop a complete set of theoretical results that describe the behavior of $kmathrmth$ order Kronecker trend filtering in $d$ dimensions.
Score: 15.798045922049862
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. KTF is a natural extension of univariate trend filtering (Steidl et al., 2006; Kim et al., 2009; Tibshirani, 2014), and is defined by minimizing a penalized least squares problem whose penalty term sums the absolute (higher-order) differences of the parameter to be estimated along each of the coordinate directions. The corresponding penalty operator can be written in terms of Kronecker products of univariate trend filtering penalty operators, hence the name Kronecker trend filtering. Equivalently, one can view KTF in terms of an $\ell_1$-penalized basis regression problem where the basis functions are tensor products of falling factorial functions, a piecewise polynomial (discrete spline) basis that underlies univariate trend filtering. This paper is a unification and extension of the results in Sadhanala et al. (2016, 2017). We develop a complete set of theoretical results that describe the behavior of $k^{\mathrm{th}}$ order Kronecker trend filtering in $d$ dimensions, for every $k \geq 0$ and $d \geq 1$. This reveals a number of interesting phenomena, including the dominance of KTF over linear smoothers in estimating heterogeneously smooth functions, and a phase transition at $d=2(k+1)$, a boundary past which (on the high dimension-to-smoothness side) linear smoothers fail to be consistent entirely. We also leverage recent results on discrete splines from Tibshirani (2020), in particular, discrete spline interpolation results that enable us to extend the KTF estimate to any off-lattice location in constant-time (independent of the size of the lattice $n$).

Related papers

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime [26.711510824243803]
We study population convergence guarantees of gradient descent (SGD) for smooth convex objectives in the regime, where the noise at optimum is zero or near zero.<n>For a well-tuned stepsize we obtain a near optimal $widetildeO (1/T + sigma_star/sqrtT)$ rate for the last iterate.
arXiv Detail & Related papers (2025-07-15T12:52:47Z)
An Uncertainty Principle for Linear Recurrent Neural Networks [54.13281679205581]
We build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past. We fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the filter has to average values around the $K$-th time step in the past with a range(width) that is proportional to $K/S$.
arXiv Detail & Related papers (2025-02-13T13:01:46Z)
Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$. We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
An adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localization [0.3749861135832072]
Heavy tails is a common feature of filtering distributions that results from the nonlinear dynamical and observation processes. We propose an algorithm to estimate the prior-to-posterior update from samples of joint forecast distribution of the states and observations. We demonstrate the benefits of this new ensemble filter on challenging filtering problems.
arXiv Detail & Related papers (2023-10-12T21:56:14Z)
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure. In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z)
Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model. As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z)
Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers [16.618918548497223]
We propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific clustering measured by the bounds number. We derive these results in various contexts and improve the known state-of-the-art label rates.
arXiv Detail & Related papers (2022-09-19T12:11:07Z)
Group-invariant max filtering [4.396860522241306]
We construct a family of $G$-invariant real-valued functions on $V$ that we call max filters. In the case where $V=mathbbRd$ and $G$ is finite, a suitable max filter bank separates orbits, and is even bilipschitz in the quotient metric.
arXiv Detail & Related papers (2022-05-27T15:18:08Z)
$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z)
A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions. We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$. We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z)
Last iterate convergence of SGD for Least-Squares in the Interpolation regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z)
A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent [85.77233010209368]
This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable. This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
arXiv Detail & Related papers (2020-06-09T02:05:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.