Robust Hyperbolic Learning with Curvature-Aware Optimization
- URL: http://arxiv.org/abs/2405.13979v3
- Date: Mon, 03 Feb 2025 12:43:02 GMT
- Title: Robust Hyperbolic Learning with Curvature-Aware Optimization
- Authors: Ahmad Bdeir, Johannes Burchert, Lars Schmidt-Thieme, Niels Landwehr,
- Abstract summary: Current hyperbolic learning approaches are prone to overfitting, computationally expensive, and prone to instability.
We introduce a novel fine-tunable hyperbolic scaling approach to constrain hyperbolic embeddings reduce approximation errors.
Our approach demonstrates consistent improvements across Computer Vision, EEG classification, and hierarchical metric learning tasks.
- Score: 7.89323764547292
- License:
- Abstract: Hyperbolic deep learning has become a growing research direction in computer vision due to the unique properties afforded by the alternate embedding space. The negative curvature and exponentially growing distance metric provide a natural framework for capturing hierarchical relationships between datapoints and allowing for finer separability between their embeddings. However, current hyperbolic learning approaches are still prone to overfitting, computationally expensive, and prone to instability, especially when attempting to learn the manifold curvature to adapt to tasks and different datasets. To address these issues, our paper presents a derivation for Riemannian AdamW that helps increase hyperbolic generalization ability. For improved stability, we introduce a novel fine-tunable hyperbolic scaling approach to constrain hyperbolic embeddings and reduce approximation errors. Using this along with our curvature-aware learning schema for Lorentzian Optimizers enables the combination of curvature and non-trivialized hyperbolic parameter learning. Our approach demonstrates consistent performance improvements across Computer Vision, EEG classification, and hierarchical metric learning tasks achieving state-of-the-art results in two domains and drastically reducing runtime.
Related papers
- Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Understanding Hyperbolic Metric Learning through Hard Negative Sampling [13.478667527129726]
We investigate the effects of integrating hyperbolic space into metric learning, particularly when training with contrastive loss.
We benchmark the results of Vision Transformers (ViTs) using a hybrid objective function that combines loss from Euclidean and hyperbolic spaces.
We also reveal that hyperbolic metric learning is highly related to hard negative sampling, providing insights for future work.
arXiv Detail & Related papers (2024-04-23T21:11:30Z) - Alignment and Outer Shell Isotropy for Hyperbolic Graph Contrastive
Learning [69.6810940330906]
We propose a novel contrastive learning framework to learn high-quality graph embedding.
Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.
We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Mitigating Over-Smoothing and Over-Squashing using Augmentations of Forman-Ricci Curvature [1.1126342180866644]
We propose a rewiring technique based on Augmented Forman-Ricci curvature (AFRC), a scalable curvature notation.
We prove that AFRC effectively characterizes over-smoothing and over-squashing effects in message-passing GNNs.
arXiv Detail & Related papers (2023-09-17T21:43:18Z) - Towards Scalable Hyperbolic Neural Networks using Taylor Series
Approximations [10.056167107654089]
Hyperbolic networks have shown prominent improvements over their Euclidean counterparts in several areas involving hierarchical datasets.
Their adoption in practice remains restricted due to (i) non-scalability on accelerated deep learning hardware, (ii) vanishing due to the closure of hyperbolic space, and (iii) information loss.
We propose the approximation of hyperbolic operators using Taylor series expansions, which allows us to reformulate the tangent gradients of hyperbolic functions into their equivariants.
arXiv Detail & Related papers (2022-06-07T22:31:17Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Hyperbolic Vision Transformers: Combining Improvements in Metric
Learning [116.13290702262248]
We propose a new hyperbolic-based model for metric learning.
At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space.
We evaluate the proposed model with six different formulations on four datasets.
arXiv Detail & Related papers (2022-03-21T09:48:23Z) - Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces.
Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.