Ridge Estimation with Nonlinear Transformations
- URL: http://arxiv.org/abs/2306.05722v3
- Date: Mon, 22 Jul 2024 13:48:36 GMT
- Title: Ridge Estimation with Nonlinear Transformations
- Authors: Zheng Zhai, Hengchao Chen, Zhigang Yao,
- Abstract summary: We show the inclusion relationship between ridges: $cR(fcirc p)subseteq cR(p)$.
We also show that the Hausdorff distance between $cR(fcirc p)$ and its projection onto $cM$ is smaller than the Hausdorff distance between $cR(p)$ and the corresponding projection.
- Score: 3.1406146587437904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ridge estimation is an important manifold learning technique. The goal of this paper is to examine the effects of nonlinear transformations on the ridge sets. The main result proves the inclusion relationship between ridges: $\cR(f\circ p)\subseteq \cR(p)$, provided that the transformation $f$ is strictly increasing and concave on the range of the function $p$. Additionally, given an underlying true manifold $\cM$, we show that the Hausdorff distance between $\cR(f\circ p)$ and its projection onto $\cM$ is smaller than the Hausdorff distance between $\cR(p)$ and the corresponding projection. This motivates us to apply an increasing and concave transformation before the ridge estimation. In specific, we show that the power transformations $f^{q}(y)=y^q/q,-\infty<q\leq 1$ are increasing and concave on $\RR_+$, and thus we can use such power transformations when $p$ is strictly positive. Numerical experiments demonstrate the advantages of the proposed methods.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Regression for matrix-valued data via Kronecker products factorization [0.5156484100374059]
We propose an estimation algorithm, termed KRO-PRO-FAC, for estimating the parameters $beta_1k subset Rep times q_1$ and $beta_2k subset Rep times q$.
Numerical studies on simulated and real data indicate that our procedure is competitive, in terms of both estimation error and predictive accuracy, compared to other existing methods.
arXiv Detail & Related papers (2024-04-30T02:44:41Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
We study the dependence on $n$ and $L$ of the maximal update ($mu$P) learning rate.
We find that it has a non-trivial dependence of $L$, scaling like $L-3/2.$
arXiv Detail & Related papers (2023-05-13T01:10:49Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Beyond Moments: Robustly Learning Affine Transformations with
Asymptotically Optimal Error [8.615625517708324]
We present a-time algorithm for learning an unknown affine transformation of the standard hypercube from samples.
Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met.
arXiv Detail & Related papers (2023-02-23T19:13:30Z) - Algebraic Aspects of Boundaries in the Kitaev Quantum Double Model [77.34726150561087]
We provide a systematic treatment of boundaries based on subgroups $Ksubseteq G$ with the Kitaev quantum double $D(G)$ model in the bulk.
The boundary sites are representations of a $*$-subalgebra $Xisubseteq D(G)$ and we explicate its structure as a strong $*$-quasi-Hopf algebra.
As an application of our treatment, we study patches with boundaries based on $K=G$ horizontally and $K=e$ vertically and show how these could be used in a quantum computer
arXiv Detail & Related papers (2022-08-12T15:05:07Z) - Entanglement scaling for $\lambda\phi_2^4$ [0.0]
We show that the order parameter $phi$, the correlation length $xi$ and quantities like $phi3$ and the entanglement entropy exhibit useful double scaling properties.
We find the value $alpha_c=11.09698(31)$ for the critical point, improving on previous results.
arXiv Detail & Related papers (2021-04-21T14:43:12Z) - Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions [20.666734673282498]
The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
arXiv Detail & Related papers (2021-03-02T09:03:44Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Expressivity of expand-and-sparsify representations [15.016047591601094]
A simple sparse coding mechanism appears in the sensory systems of several organisms.
We show that $z$ unpacks the information in $x$ and makes it more readily accessible.
We consider whether the representation is adaptive to manifold structure in the input space.
arXiv Detail & Related papers (2020-06-05T23:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.