TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion
- URL: http://arxiv.org/abs/2601.09633v1
- Date: Wed, 14 Jan 2026 17:08:37 GMT
- Title: TaxoBell: Gaussian Box Embeddings for Self-Supervised Taxonomy Expansion
- Authors: Sahil Mishra, Srinitish Srinivasan, Srikanta Bedathur, Tanmoy Chakraborty,
- Abstract summary: Taxonomies form the backbone of structured knowledge representation across diverse domains.<n>Existing automated methods rely on point-based vector embeddings.<n>Box embeddings offer a promising alternative by enabling containment and disjointness.
- Score: 25.5809347473818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Taxonomies form the backbone of structured knowledge representation across diverse domains, enabling applications such as e-commerce catalogs, semantic search, and biomedical discovery. Yet, manual taxonomy expansion is labor-intensive and cannot keep pace with the emergence of new concepts. Existing automated methods rely on point-based vector embeddings, which model symmetric similarity and thus struggle with the asymmetric "is-a" relationships that are fundamental to taxonomies. Box embeddings offer a promising alternative by enabling containment and disjointness, but they face key issues: (i) unstable gradients at the intersection boundaries, (ii) no notion of semantic uncertainty, and (iii) limited capacity to represent polysemy or ambiguity. We address these shortcomings with TaxoBell, a Gaussian box embedding framework that translates between box geometries and multivariate Gaussian distributions, where means encode semantic location and covariances encode uncertainty. Energy-based optimization yields stable optimization, robust modeling of ambiguous concepts, and interpretable hierarchical reasoning. Extensive experimentation on five benchmark datasets demonstrates that TaxoBell significantly outperforms eight state-of-the-art taxonomy expansion baselines by 19% in MRR and around 25% in Recall@k. We further demonstrate the advantages and pitfalls of TaxoBell with error analysis and ablation studies.
Related papers
- Transforming Expert Knowledge into Scalable Ontology via Large Language Models [0.0]
Traditional approaches to taxonomy alignment rely on expert review of concept pairs.<n>We propose a novel framework that combines large language models (LLMs) with expert calibration and iterative prompt optimization.<n>In evaluating our framework on a domain-specific mapping task of concept essentiality, we achieved an F1-score of 0.97, substantially exceeding the human benchmark of 0.68.
arXiv Detail & Related papers (2025-06-10T03:48:26Z) - QuanTaxo: A Quantum Approach to Self-Supervised Taxonomy Expansion [17.865428778692557]
We introduce QuanTaxo, an innovative quantum-inspired framework for taxonomy expansion.<n>We show that QuanTaxo significantly outperforms classical embedding models.<n>We also highlight the superiority of QuanTaxo through extensive ablation and case studies.
arXiv Detail & Related papers (2025-01-23T18:40:02Z) - Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation [12.308473939796945]
Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve dense tasks without laborious annotations.
The performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity.
We propose UniA, a unified single-staged WSSS framework, to tackle this issue from the perspective of uncertainty inference and affinity diversification.
arXiv Detail & Related papers (2024-04-12T01:54:59Z) - Exploiting hidden structures in non-convex games for convergence to Nash
equilibrium [62.88214569402201]
A wide array of modern machine learning applications can be formulated as non-cooperative Nashlibria.
We provide explicit convergence guarantees for both deterministic and deterministic environments.
arXiv Detail & Related papers (2023-12-27T15:21:25Z) - Insert or Attach: Taxonomy Completion via Box Embedding [75.69894194912595]
Previous approaches embed concepts as vectors in Euclidean space, which makes it difficult to model asymmetric relations in taxonomy.
We develop a framework, TaxBox, that leverages box containment and center closeness to design two specialized geometric scorers within the box embedding space.
These scorers are tailored for insertion and attachment operations and can effectively capture intrinsic relationships between concepts.
arXiv Detail & Related papers (2023-05-18T14:34:58Z) - Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds [8.385386712928785]
Human motion serves as high-level hierarchical abstractions that classify how humans move and interact with their environment.
We propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure.
We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts.
arXiv Detail & Related papers (2022-10-04T15:19:24Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion [50.794640012673064]
As data and business scope grow in real applications, existing need to be expanded to incorporate new concepts.
Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations.
We propose TaxoOrder, a novel self-supervised framework that simultaneously discovers the local hypernym-hyponym structure among new concepts and decides the order of insertion.
arXiv Detail & Related papers (2021-04-08T11:00:43Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths [53.45704816829921]
We propose a self-supervised taxonomy expansion model named STEAM.
STEAM generates natural self-supervision signals, and formulates a node attachment prediction task.
Experiments show STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank.
arXiv Detail & Related papers (2020-06-18T00:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.