Robust Hyperbolic Learning with Curvature-Aware Optimization
- URL: http://arxiv.org/abs/2405.13979v3
- Date: Mon, 03 Feb 2025 12:43:02 GMT
- Title: Robust Hyperbolic Learning with Curvature-Aware Optimization
- Authors: Ahmad Bdeir, Johannes Burchert, Lars Schmidt-Thieme, Niels Landwehr,
- Abstract summary: Current hyperbolic learning approaches are prone to overfitting, computationally expensive, and prone to instability.<n>We introduce a novel fine-tunable hyperbolic scaling approach to constrain hyperbolic embeddings reduce approximation errors.<n>Our approach demonstrates consistent improvements across Computer Vision, EEG classification, and hierarchical metric learning tasks.
- Score: 7.89323764547292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyperbolic deep learning has become a growing research direction in computer vision due to the unique properties afforded by the alternate embedding space. The negative curvature and exponentially growing distance metric provide a natural framework for capturing hierarchical relationships between datapoints and allowing for finer separability between their embeddings. However, current hyperbolic learning approaches are still prone to overfitting, computationally expensive, and prone to instability, especially when attempting to learn the manifold curvature to adapt to tasks and different datasets. To address these issues, our paper presents a derivation for Riemannian AdamW that helps increase hyperbolic generalization ability. For improved stability, we introduce a novel fine-tunable hyperbolic scaling approach to constrain hyperbolic embeddings and reduce approximation errors. Using this along with our curvature-aware learning schema for Lorentzian Optimizers enables the combination of curvature and non-trivialized hyperbolic parameter learning. Our approach demonstrates consistent performance improvements across Computer Vision, EEG classification, and hierarchical metric learning tasks achieving state-of-the-art results in two domains and drastically reducing runtime.
Related papers
- Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU [50.9588132578029]
This paper investigates machine unlearning in hyperbolic contrastive learning.
We adapt Alignment to MERU, a model that embeds images and text in hyperbolic space to better capture semantic hierarchies.
Our approach introduces hyperbolic-specific components including entailment calibration and norm regularization that leverage the unique properties of hyperbolic space.
arXiv Detail & Related papers (2025-03-19T12:47:37Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Understanding Hyperbolic Metric Learning through Hard Negative Sampling [13.478667527129726]
We investigate the effects of integrating hyperbolic space into metric learning, particularly when training with contrastive loss.
We benchmark the results of Vision Transformers (ViTs) using a hybrid objective function that combines loss from Euclidean and hyperbolic spaces.
We also reveal that hyperbolic metric learning is highly related to hard negative sampling, providing insights for future work.
arXiv Detail & Related papers (2024-04-23T21:11:30Z) - Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature [1.1126342180866644]
We propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation.
We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs.
arXiv Detail & Related papers (2023-09-17T21:43:18Z) - Nonparametric Linear Feature Learning in Regression Through Regularisation [0.0]
We propose a novel method for joint linear feature learning and non-parametric function estimation.
By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions.
We establish that the expected risk of our method converges to the minimal risk under minimal assumptions and with explicit rates.
arXiv Detail & Related papers (2023-07-24T12:52:55Z) - Accelerated Linearized Laplace Approximation for Bayesian Deep Learning [34.81292720605279]
We develop a Nystrom approximation to neural tangent kernels (NTKs) to accelerate LLA.
Our method benefits from the capability of popular deep learning libraries for forward mode automatic differentiation.
Our method can even scale up to architectures like vision transformers.
arXiv Detail & Related papers (2022-10-23T07:49:03Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Towards Scalable Hyperbolic Neural Networks using Taylor Series
Approximations [10.056167107654089]
Hyperbolic networks have shown prominent improvements over their Euclidean counterparts in several areas involving hierarchical datasets.
Their adoption in practice remains restricted due to (i) non-scalability on accelerated deep learning hardware, (ii) vanishing due to the closure of hyperbolic space, and (iii) information loss.
We propose the approximation of hyperbolic operators using Taylor series expansions, which allows us to reformulate the tangent gradients of hyperbolic functions into their equivariants.
arXiv Detail & Related papers (2022-06-07T22:31:17Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Hyperbolic Vision Transformers: Combining Improvements in Metric
Learning [116.13290702262248]
We propose a new hyperbolic-based model for metric learning.
At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space.
We evaluate the proposed model with six different formulations on four datasets.
arXiv Detail & Related papers (2022-03-21T09:48:23Z) - Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces.
Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z) - Adaptive Learning Rate and Momentum for Training Deep Neural Networks [0.0]
We develop a fast training method motivated by the nonlinear Conjugate Gradient (CG) framework.
Experiments in image classification datasets show that our method yields faster convergence than other local solvers.
arXiv Detail & Related papers (2021-06-22T05:06:56Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Level-Set Curvature Neural Networks: A Hybrid Approach [0.0]
We present a hybrid strategy based on deep learning to compute mean curvature in the level-set method.
The proposed inference system combines a dictionary of improved regression models with standard numerical schemes to estimate curvature more accurately.
Our findings confirm that machine learning is a promising venue for devising viable solutions to the level-set method's numerical shortcomings.
arXiv Detail & Related papers (2021-04-07T06:51:52Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.