Word Embeddings Revisited: Do LLMs Offer Something New?
- URL: http://arxiv.org/abs/2402.11094v2
- Date: Sat, 2 Mar 2024 14:01:04 GMT
- Title: Word Embeddings Revisited: Do LLMs Offer Something New?
- Authors: Matthew Freestone and Shubhra Kanti Karmaker Santu
- Abstract summary: Learning meaningful word embeddings is key to training a robust language model.
The recent rise of Large Language Models (LLMs) has provided us with many new word/sentence/document embedding models.
- Score: 2.822851601000061
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning meaningful word embeddings is key to training a robust language
model. The recent rise of Large Language Models (LLMs) has provided us with
many new word/sentence/document embedding models. Although LLMs have shown
remarkable advancement in various NLP tasks, it is still unclear whether the
performance improvement is merely because of scale or whether underlying
embeddings they produce significantly differ from classical encoding models
like Sentence-BERT (SBERT) or Universal Sentence Encoder (USE). This paper
systematically investigates this issue by comparing classical word embedding
techniques against LLM-based word embeddings in terms of their latent vector
semantics. Our results show that LLMs tend to cluster semantically related
words more tightly than classical models. LLMs also yield higher average
accuracy on the Bigger Analogy Test Set (BATS) over classical methods. Finally,
some LLMs tend to produce word embeddings similar to SBERT, a relatively
lighter classical model.
Related papers
- AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models [94.82766517752418]
We propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically principled manner.
Our results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs.
arXiv Detail & Related papers (2024-10-14T03:35:11Z) - From Tokens to Words: On the Inner Lexicon of LLMs [7.148628740938674]
Natural language is composed of words, but modern LLMs process sub-words as input.
We present evidence that LLMs engage in an intrinsic detokenization process, where sub-word sequences are combined into coherent word representations.
Our findings suggest that LLMs maintain a latent vocabulary beyond the tokenizer's scope.
arXiv Detail & Related papers (2024-10-08T09:53:35Z) - Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints [20.844061807562436]
We propose SENSE, a novel prompting approach that embeds semantic hints within the prompt.
Experiments show that SENSE consistently improves LLMs' performance across various tasks.
arXiv Detail & Related papers (2024-09-22T14:35:09Z) - Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Making Large Language Models A Better Foundation For Dense Retrieval [19.38740248464456]
Dense retrieval needs to learn discriminative text embeddings to represent the semantic relationship between query and document.
It may benefit from the using of large language models (LLMs), given LLMs' strong capability on semantic understanding.
We propose LLaRA (LLM adapted for dense RetrievAl), which works as a post-hoc adaptation of dense retrieval application.
arXiv Detail & Related papers (2023-12-24T15:10:35Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - BeLLM: Backward Dependency Enhanced Large Language Model for Sentence Embeddings [4.545354973721937]
We propose a novel model: backward dependency enhanced large language model (BeLLM)
It learns sentence embeddings via transforming specific attention layers from uni- to bi-directional.
It shows that auto-regressive LLMs benefit from backward dependencies for sentence embeddings.
arXiv Detail & Related papers (2023-11-09T11:53:52Z) - Scaling Sentence Embeddings with Large Language Models [43.19994568210206]
In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance.
Our approach involves adapting the previous prompt-based representation method for autoregressive models.
By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity tasks.
arXiv Detail & Related papers (2023-07-31T13:26:03Z) - Waffling around for Performance: Visual Classification with Random Words
and Broad Concepts [121.60918966567657]
WaffleCLIP is a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors.
We conduct an extensive experimental study on the impact and shortcomings of additional semantics introduced with LLM-generated descriptors.
arXiv Detail & Related papers (2023-06-12T17:59:48Z) - Future Vector Enhanced LSTM Language Model for LVCSR [67.03726018635174]
This paper proposes a novel enhanced long short-term memory (LSTM) LM using the future vector.
Experiments show that, the proposed new LSTM LM gets a better result on BLEU scores for long term sequence prediction.
Rescoring using both the new and conventional LSTM LMs can achieve a very large improvement on the word error rate.
arXiv Detail & Related papers (2020-07-31T08:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.