Hyperbolic Deep Learning for Foundation Models: A Survey
- URL: http://arxiv.org/abs/2507.17787v1
- Date: Wed, 23 Jul 2025 09:50:17 GMT
- Title: Hyperbolic Deep Learning for Foundation Models: A Survey
- Authors: Neil He, Hiren Madhu, Ngoc Bui, Menglin Yang, Rex Ying,
- Abstract summary: Foundation models pre-trained on massive datasets have demonstrated remarkable success in diverse downstream tasks.<n>Recent advances have leveraged hyperbolic neural networks to enhance foundation models.<n>This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models.
- Score: 16.14776172953206
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Foundation models pre-trained on massive datasets, including large language models (LLMs), vision-language models (VLMs), and large multimodal models, have demonstrated remarkable success in diverse downstream tasks. However, recent studies have shown fundamental limitations of these models: (1) limited representational capacity, (2) lower adaptability, and (3) diminishing scalability. These shortcomings raise a critical question: is Euclidean geometry truly the optimal inductive bias for all foundation models, or could incorporating alternative geometric spaces enable models to better align with the intrinsic structure of real-world data and improve reasoning processes? Hyperbolic spaces, a class of non-Euclidean manifolds characterized by exponential volume growth with respect to distance, offer a mathematically grounded solution. These spaces enable low-distortion embeddings of hierarchical structures (e.g., trees, taxonomies) and power-law distributions with substantially fewer dimensions compared to Euclidean counterparts. Recent advances have leveraged these properties to enhance foundation models, including improving LLMs' complex reasoning ability, VLMs' zero-shot generalization, and cross-modal semantic alignment, while maintaining parameter efficiency. This paper provides a comprehensive review of hyperbolic neural networks and their recent development for foundation models. We further outline key challenges and research directions to advance the field.
Related papers
- HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts [23.011684464345294]
We introduce HELM, a family of HypErbolic Large Language Models.<n>For HELM-MICE, we develop hyperbolic Multi-Head Latent Attention.<n>For both models, we develop essential hyperbolic equivalents of rotary positional encodings and RMS normalization.
arXiv Detail & Related papers (2025-05-30T15:42:42Z) - Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries [42.83280708842304]
Euclidean space has been the de facto geometric setting for machine learning architectures.<n>At a large scale, real-world data often exhibit inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling.<n>This paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models.
arXiv Detail & Related papers (2025-04-11T18:07:33Z) - Low-Rank Adaptation for Foundation Models: A Comprehensive Review [42.23155921954156]
Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges.<n>This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models.
arXiv Detail & Related papers (2024-12-31T09:38:55Z) - RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [60.596005921295806]
Agglomerative models have emerged as a powerful approach to training vision foundation models.<n>We identify critical challenges including resolution mode shifts, teacher imbalance, idiosyncratic teacher artifacts, and an excessive number of output tokens.<n>We propose several novel solutions: multi-resolution training, mosaic augmentation, and improved balancing of teacher loss functions.
arXiv Detail & Related papers (2024-12-10T17:06:41Z) - From Semantics to Hierarchy: A Hybrid Euclidean-Tangent-Hyperbolic Space Model for Temporal Knowledge Graph Reasoning [1.1372536310854844]
Temporal knowledge graph (TKG) reasoning predicts future events based on historical data.
Existing Euclidean models excel at capturing semantics but struggle with hierarchy.
We propose a novel hybrid geometric space approach that leverages the strengths of both Euclidean and hyperbolic models.
arXiv Detail & Related papers (2024-08-30T10:33:08Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks [50.29356570858905]
We introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation.<n>We provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated.<n>This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
arXiv Detail & Related papers (2024-05-24T17:19:57Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Hyperbolic Neural Networks++ [66.16106727715061]
We generalize the fundamental components of neural networks in a single hyperbolic geometry model, namely, the Poincar'e ball model.
Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.
arXiv Detail & Related papers (2020-06-15T08:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.