Related papers: Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings

Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings

URL: http://arxiv.org/abs/2505.18973v2
Date: Tue, 27 May 2025 01:24:12 GMT
Title: Hierarchical Mamba Meets Hyperbolic Geometry: A New Paradigm for Structured Language Embeddings
Authors: Sarang Patil, Ashish Parmanand Pandey, Ioannis Koutis, Mengjia Xu,
Abstract summary: We propose Hierarchical Mamba (HiM) to learn hierarchy-aware language embeddings.<n>HiM integrates efficient Mamba2 with exponential growth and curved nature of hyperbolic geometry.<n>We show that both HiM models effectively capture hierarchical relationships for four ontological datasets.
Score: 1.4183971140167244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Selective state-space models have achieved great success in long-sequence modeling. However, their capacity for language representation, especially in complex hierarchical reasoning tasks, remains underexplored. Most large language models rely on flat Euclidean embeddings, limiting their ability to capture latent hierarchies. To address this limitation, we propose Hierarchical Mamba (HiM), integrating efficient Mamba2 with exponential growth and curved nature of hyperbolic geometry to learn hierarchy-aware language embeddings for deeper linguistic understanding. Mamba2-processed sequences are projected to the Poincare ball (via tangent-based mapping) or Lorentzian manifold (via cosine and sine-based mapping) with "learnable" curvature, optimized with a combined hyperbolic loss. Our HiM model facilitates the capture of relational distances across varying hierarchical levels, enabling effective long-range reasoning. This makes it well-suited for tasks like mixed-hop prediction and multi-hop inference in hierarchical classification. We evaluated our HiM with four linguistic and medical datasets for mixed-hop prediction and multi-hop inference tasks. Experimental results demonstrated that: 1) Both HiM models effectively capture hierarchical relationships for four ontological datasets, surpassing Euclidean baselines. 2) HiM-Poincare captures fine-grained semantic distinctions with higher h-norms, while HiM-Lorentz provides more stable, compact, and hierarchy-preserving embeddings favoring robustness over detail.

Related papers

CCMamba: Selective State-Space Models for Higher-Order Graph Learning on Combinatorial Complexes [16.627877999057436]
Topological deep learning has emerged for modeling higher-order structures beyond pairwise interactions.<n>We propose Combinatorial Complex Mamba, the first unified mamba-based neural framework for learning on relational complexes.<n> CCMamba reformulates message passing as a selective state-space modeling problem by organizing multi-rank incidence relations into structured sequences processed by rank-aware state-space models.
arXiv Detail & Related papers (2026-01-28T11:52:13Z)
Hyperbolic Large Language Models [7.483401973996036]
Large language models (LLMs) have achieved remarkable success and demonstrated superior performance across various tasks.<n>However, many real-world data exhibit highly non-Euclidean latent hierarchical anatomy, such as protein networks, transportation networks, financial networks, brain networks, and linguistic structures or syntactic trees in natural languages.<n>We provide a comprehensive and contextual exposition of recent advancements in LLMs that leverage hyperbolic geometry as a representation space to enhance semantic representation learning and multi-scale reasoning.
arXiv Detail & Related papers (2025-09-06T15:56:46Z)
Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies [48.72319569157807]
Residual Quantization (RQ) is widely used to generate discrete, multitoken representations for hierarchical data.<n>We propose Hyperbolic Residual Quantization (HRQ), which embeds data in a hyperbolic manifold.<n>HRQ imparts an inductive bias that aligns naturally with hierarchical branching.
arXiv Detail & Related papers (2025-05-18T13:14:07Z)
Teaching Metric Distance to Autoregressive Multimodal Foundational Models [21.894600900013316]
We introduce DIST2Loss, a distance-aware framework designed to train autoregressive discrete models.<n>DIST2Loss transforms exponential family distributions derived from inherent distance metrics into discrete, categorical optimization targets.<n> Empirical evaluations show consistent performance gains in diverse multimodal applications.
arXiv Detail & Related papers (2025-03-04T08:14:51Z)
From Semantics to Hierarchy: A Hybrid Euclidean-Tangent-Hyperbolic Space Model for Temporal Knowledge Graph Reasoning [1.1372536310854844]
Temporal knowledge graph (TKG) reasoning predicts future events based on historical data. Existing Euclidean models excel at capturing semantics but struggle with hierarchy. We propose a novel hybrid geometric space approach that leverages the strengths of both Euclidean and hyperbolic models.
arXiv Detail & Related papers (2024-08-30T10:33:08Z)
GrootVL: Tree Topology is All You Need in State Space Model [66.36757400689281]
GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks. Our method significantly outperforms existing structured state space models on image classification, object detection and segmentation. By fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
arXiv Detail & Related papers (2024-06-04T15:09:29Z)
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2. Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction. This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z)
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks. Existing methods learn features from either word-level or region-level but fail to consider both simultaneously. We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z)
Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds [8.385386712928785]
Human motion serves as high-level hierarchical abstractions that classify how humans move and interact with their environment. We propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure. We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts.
arXiv Detail & Related papers (2022-10-04T15:19:24Z)
Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones [64.75766944882389]
We present ConE (Cone Embedding), a KG embedding model that is able to simultaneously model multiple hierarchical as well as non-hierarchical relations in a knowledge graph. In particular, ConE uses cone containment constraints in different subspaces of the hyperbolic embedding space to capture multiple heterogeneous hierarchies. Our approach yields new state-of-the-art Hits@1 of 45.3% on WN18RR and 16.1% on DDB14 (0.231 MRR)
arXiv Detail & Related papers (2021-10-28T07:16:08Z)
A Fully Hyperbolic Neural Model for Hierarchical Multi-Class Classification [7.8176853587105075]
Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. This work proposes a fully hyperbolic model for multi-class multi-label classification, which performs all operations in hyperbolic space. A thorough analysis sheds light on the impact of each component in the final prediction and showcases its ease of integration with Euclidean layers.
arXiv Detail & Related papers (2020-10-05T14:42:56Z)
APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations. An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions. Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.