Language Models as Hierarchy Encoders
- URL: http://arxiv.org/abs/2401.11374v1
- Date: Sun, 21 Jan 2024 02:29:12 GMT
- Title: Language Models as Hierarchy Encoders
- Authors: Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks
- Abstract summary: We introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs)
Our method situates the output embedding space of pre-trained LMs within a Poincar'e ball with a curvature that adapts to the embedding dimension.
We evaluate HiTs against pre-trained and fine-tuned LMs, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies.
- Score: 24.071698413762388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpreting hierarchical structures latent in language is a key limitation
of current language models (LMs). While previous research has implicitly
leveraged these hierarchies to enhance LMs, approaches for their explicit
encoding are yet to be explored. To address this, we introduce a novel approach
to re-train transformer encoder-based LMs as Hierarchy Transformer encoders
(HiTs), harnessing the expansive nature of hyperbolic space. Our method
situates the output embedding space of pre-trained LMs within a Poincar\'e ball
with a curvature that adapts to the embedding dimension, followed by
re-training on hyperbolic cluster and centripetal losses. These losses are
designed to effectively cluster related entities (input as texts) and organise
them hierarchically. We evaluate HiTs against pre-trained and fine-tuned LMs,
focusing on their capabilities in simulating transitive inference, predicting
subsumptions, and transferring knowledge across hierarchies. The results
demonstrate that HiTs consistently outperform both pre-trained and fine-tuned
LMs in these tasks, underscoring the effectiveness and transferability of our
re-trained hierarchy encoders.
Related papers
- Transformer Alignment in Large Language Models [3.007031501305338]
We consider Large Language Models (LLMs) as transforming embeddings via a discrete, coupled, nonlinear, dynamical system in high dimensions.
This perspective motivates tracing the trajectories of individual tokens as they pass through transformer blocks, and linearizing the system along these trajectories through their Jacobian matrices.
In our analysis of 38 openly available LLMs, we uncover the alignment of top left and right singular vectors of Residual Jacobians, as well as the emergence of linearity and layer-wise exponential growth.
arXiv Detail & Related papers (2024-07-10T16:30:27Z) - Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers [16.253898272659242]
State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive.
This has sparked a research agenda to reduce these models' parameter count and computational costs without significantly impacting their performance.
We consider three candidate linear layer approximations in the FFN by combining efficient low-rank and block-diagonal matrices.
arXiv Detail & Related papers (2024-06-24T08:43:21Z) - Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning [54.682106515794864]
offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
This paper introduces $textbfLanguage Models for $textbfMo$tion Control ($textbfLaMo$), a general framework based on Decision Transformers to use pre-trained Language Models (LMs) for offline RL.
Empirical results indicate $textbfLaMo$ achieves state-of-the-art performance in sparse-reward tasks.
arXiv Detail & Related papers (2023-10-31T16:24:17Z) - CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z) - Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM [31.25193238045053]
We introduce a novel method, namely GenCo, which leverages the strong generative power of large language models to assist in training a smaller language model.
In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways.
It helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels.
arXiv Detail & Related papers (2023-04-24T07:35:38Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Semi-supervised source localization with deep generative modeling [27.344649091365067]
We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAEs)
VAE-SSL can outperform both SRP-PHAT and CNN in label-limited scenarios.
arXiv Detail & Related papers (2020-05-27T04:59:52Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Tree-structured Attention with Hierarchical Accumulation [103.47584968330325]
"Hierarchical Accumulation" encodes parse tree structures into self-attention at constant time complexity.
Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task.
arXiv Detail & Related papers (2020-02-19T08:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.