Visualizing high-dimensional loss landscapes with Hessian directions
- URL: http://arxiv.org/abs/2208.13219v2
- Date: Fri, 1 Dec 2023 20:44:34 GMT
- Title: Visualizing high-dimensional loss landscapes with Hessian directions
- Authors: Lucas B\"ottcher and Gregory Wheeler
- Abstract summary: We study how curvature properties in lower-dimensional loss representations depend on those in the original loss space.
saddle points in the original space are rarely correctly identified as such in expected lower-dimensional representations if random projections are used.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing geometric properties of high-dimensional loss functions, such as
local curvature and the existence of other optima around a certain point in
loss space, can help provide a better understanding of the interplay between
neural network structure, implementation attributes, and learning performance.
In this work, we combine concepts from high-dimensional probability and
differential geometry to study how curvature properties in lower-dimensional
loss representations depend on those in the original loss space. We show that
saddle points in the original space are rarely correctly identified as such in
expected lower-dimensional representations if random projections are used. The
principal curvature in the expected lower-dimensional representation is
proportional to the mean curvature in the original loss space. Hence, the mean
curvature in the original loss space determines if saddle points appear, on
average, as either minima, maxima, or almost flat regions. We use the
connection between expected curvature in random projections and mean curvature
in the original space (i.e., the normalized Hessian trace) to compute
Hutchinson-type trace estimates without calculating Hessian-vector products as
in the original Hutchinson method. Because random projections are not suitable
to correctly identify saddle information, we propose to study projections along
dominant Hessian directions that are associated with the largest and smallest
principal curvatures. We connect our findings to the ongoing debate on loss
landscape flatness and generalizability. Finally, for different common image
classifiers and a function approximator, we show and compare random and Hessian
projections of loss landscapes with up to about $7\times 10^6$ parameters.
Related papers
- ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction [50.07671826433922]
It is non-trivial to simultaneously recover meticulous geometry and preserve smoothness across regions with differing characteristics.
We propose ND-SDF, which learns a Normal Deflection field to represent the angular deviation between the scene normal and the prior normal.
Our method not only obtains smooth weakly textured regions such as walls and floors but also preserves the geometric details of complex structures.
arXiv Detail & Related papers (2024-08-22T17:59:01Z) - Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.
We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.
We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z) - Wasserstein Projection Pursuit of Non-Gaussian Signals [8.789656856095947]
We consider the problem of locating interesting directions in a $k$-dimensional non-Gaussian subspace of interesting features in a high-dimensional data cloud.
Under a generative model, we prove rigorous statistical guarantees on the accuracy of approxing this unknown subspace.
Our results operate in the regime where the data dimensionality is comparable to the sample size.
arXiv Detail & Related papers (2023-02-24T15:36:51Z) - Curved Geometric Networks for Visual Anomaly Recognition [39.91252195360767]
Learning a latent embedding to understand the underlying nature of data distribution is often formulated in Euclidean spaces with zero curvature.
In this work, we investigate benefits of the curved space for analyzing anomalies or out-of-distribution objects in data.
arXiv Detail & Related papers (2022-08-02T01:15:39Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry [3.712728573432119]
We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology.
We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
arXiv Detail & Related papers (2022-02-07T09:57:54Z) - Differential Geometry in Neural Implicits [0.6198237241838558]
We introduce a neural implicit framework that bridges discrete differential geometry of triangle meshes and continuous differential geometry of neural implicit surfaces.
It exploits the differentiable properties of neural networks and the discrete geometry of triangle meshes to approximate them as the zero-level sets of neural implicit functions.
arXiv Detail & Related papers (2022-01-23T13:40:45Z) - Deep Point Cloud Normal Estimation via Triplet Learning [12.271669779096076]
We propose a novel normal estimation method for point clouds.
It consists of two phases: (a) feature encoding which learns representations of local patches, and (b) normal estimation that takes the learned representation as input and regresses the normal vector.
Our method preserves sharp features and achieves better normal estimation results on CAD-like shapes.
arXiv Detail & Related papers (2021-10-20T11:16:00Z) - Deep Modeling of Growth Trajectories for Longitudinal Prediction of
Missing Infant Cortical Surfaces [58.780482825156035]
We will introduce a method for longitudinal prediction of cortical surfaces using a spatial graph convolutional neural network (GCNN)
The proposed method is designed to model the cortical growth trajectories and jointly predict inner and outer curved surfaces at multiple time points.
We will demonstrate with experimental results that our method is capable of capturing the nonlinearity oftemporal cortical growth patterns.
arXiv Detail & Related papers (2020-09-06T18:46:04Z) - GarNet++: Improving Fast and Accurate Static3D Cloth Draping by
Curvature Loss [89.96698250086064]
We introduce a two-stream deep network model that produces a visually plausible draping of a template cloth on virtual 3D bodies.
Our network learns to mimic a Physics-Based Simulation (PBS) method while requiring two orders of magnitude less computation time.
We validate our framework on four garment types for various body shapes and poses.
arXiv Detail & Related papers (2020-07-20T13:40:15Z) - Semiparametric Nonlinear Bipartite Graph Representation Learning with
Provable Guarantees [106.91654068632882]
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate.
Our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
arXiv Detail & Related papers (2020-03-02T16:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.