Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies
- URL: http://arxiv.org/abs/2505.12404v1
- Date: Sun, 18 May 2025 13:14:07 GMT
- Title: Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies
- Authors: Piotr Piękos, Subhradeep Kayal, Alexandros Karatzoglou,
- Abstract summary: Residual Quantization (RQ) is widely used to generate discrete, multitoken representations for hierarchical data.<n>We propose Hyperbolic Residual Quantization (HRQ), which embeds data in a hyperbolic manifold.<n>HRQ imparts an inductive bias that aligns naturally with hierarchical branching.
- Score: 48.72319569157807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical data arise in countless domains, from biological taxonomies and organizational charts to legal codes and knowledge graphs. Residual Quantization (RQ) is widely used to generate discrete, multitoken representations for such data by iteratively quantizing residuals in a multilevel codebook. However, its reliance on Euclidean geometry can introduce fundamental mismatches that hinder modeling of hierarchical branching, necessary for faithful representation of hierarchical data. In this work, we propose Hyperbolic Residual Quantization (HRQ), which embeds data natively in a hyperbolic manifold and performs residual quantization using hyperbolic operations and distance metrics. By adapting the embedding network, residual computation, and distance metric to hyperbolic geometry, HRQ imparts an inductive bias that aligns naturally with hierarchical branching. We claim that HRQ in comparison to RQ can generate more useful for downstream tasks discrete hierarchical representations for data with latent hierarchies. We evaluate HRQ on two tasks: supervised hierarchy modeling using WordNet hypernym trees, where the model is supervised to learn the latent hierarchy - and hierarchy discovery, where, while latent hierarchy exists in the data, the model is not directly trained or evaluated on a task related to the hierarchy. Across both scenarios, HRQ hierarchical tokens yield better performance on downstream tasks compared to Euclidean RQ with gains of up to $20\%$ for the hierarchy modeling task. Our results demonstrate that integrating hyperbolic geometry into discrete representation learning substantially enhances the ability to capture latent hierarchies.
Related papers
- GrootVL: Tree Topology is All You Need in State Space Model [66.36757400689281]
GrootVL is a versatile multimodal framework that can be applied to both visual and textual tasks.
Our method significantly outperforms existing structured state space models on image classification, object detection and segmentation.
By fine-tuning large language models, our approach achieves consistent improvements in multiple textual tasks at minor training cost.
arXiv Detail & Related papers (2024-06-04T15:09:29Z) - How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model [4.215221129670858]
We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations.
We quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
arXiv Detail & Related papers (2024-04-16T17:01:27Z) - Joint Entity and Relation Extraction with Span Pruning and Hypergraph
Neural Networks [58.43972540643903]
We propose HyperGraph neural network for ERE ($hgnn$), which is built upon the PL-marker (a state-of-the-art marker-based pipleline model)
To alleviate error propagation,we use a high-recall pruner mechanism to transfer the burden of entity identification and labeling from the NER module to the joint module of our model.
Experiments on three widely used benchmarks for ERE task show significant improvements over the previous state-of-the-art PL-marker.
arXiv Detail & Related papers (2023-10-26T08:36:39Z) - How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model [47.617093812158366]
We introduce the Random Hierarchy Model: a family of synthetic tasks inspired by the hierarchical structure of language and images.
We find that deep networks learn the task by developing internal representations invariant to exchanging equivalent groups.
Our results indicate how deep networks overcome the curse of dimensionality by building invariant representations.
arXiv Detail & Related papers (2023-07-05T09:11:09Z) - A Hierarchical Block Distance Model for Ultra Low-Dimensional Graph
Representations [0.0]
This paper proposes a novel scalable graph representation learning method named the Block Distance Model (HBDM)
HBDM accounts for homophily and transitivity by accurately approximating the latent distance model (LDM) throughout the hierarchy.
We evaluate the performance of the HBDM on massive networks consisting of millions of nodes.
arXiv Detail & Related papers (2022-04-12T15:23:12Z) - Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic
Cones [64.75766944882389]
We present ConE (Cone Embedding), a KG embedding model that is able to simultaneously model multiple hierarchical as well as non-hierarchical relations in a knowledge graph.
In particular, ConE uses cone containment constraints in different subspaces of the hyperbolic embedding space to capture multiple heterogeneous hierarchies.
Our approach yields new state-of-the-art Hits@1 of 45.3% on WN18RR and 16.1% on DDB14 (0.231 MRR)
arXiv Detail & Related papers (2021-10-28T07:16:08Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z) - Shift Aggregate Extract Networks [3.3263205689999453]
We introduce an architecture based on deep hierarchical decompositions to learn effective representations of large graphs.
Our framework extends classic R-decompositions used in kernel methods, enabling nested part-of-part relations.
We show empirically that our approach is able to outperform current state-of-the-art graph classification methods on large social network datasets.
arXiv Detail & Related papers (2017-03-16T09:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.