WinoDict: Probing language models for in-context word acquisition
- URL: http://arxiv.org/abs/2209.12153v1
- Date: Sun, 25 Sep 2022 05:30:13 GMT
- Title: WinoDict: Probing language models for in-context word acquisition
- Authors: Julian Martin Eisenschlos and Jeremy R. Cole and Fangyu Liu and
William W. Cohen
- Abstract summary: We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference.
We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark.
- Score: 32.81587292382359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new in-context learning paradigm to measure Large Language
Models' (LLMs) ability to learn novel words during inference. In particular, we
rewrite Winograd-style co-reference resolution problems by replacing the key
concept word with a synthetic but plausible word that the model must understand
to complete the task. Solving this task requires the model to make use of the
dictionary definition of the new word given in the prompt. This benchmark
addresses word acquisition, one important aspect of the diachronic degradation
known to afflict LLMs. As LLMs are frozen in time at the moment they are
trained, they are normally unable to reflect the way language changes over
time. We show that the accuracy of LLMs compared to the original Winograd tasks
decreases radically in our benchmark, thus identifying a limitation of current
models and providing a benchmark to measure future improvements in LLMs ability
to do in-context learning.
Related papers
- MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models [8.7734602595507]
We propose MMLU-SR, a novel dataset designed to measure the true comprehension abilities of Large Language Models (LLMs)
We modified standardized test questions by replacing a key term with a dummy word along with its definition.
We found a substantial reduction in model performance after such replacement, suggesting poor comprehension.
arXiv Detail & Related papers (2024-06-15T05:35:47Z) - CoLLEGe: Concept Embedding Generation for Large Language Models [12.812113254812028]
CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts.
We design a series of tasks to test new concept learning in challenging real-world scenarios.
arXiv Detail & Related papers (2024-03-22T17:26:05Z) - NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms [19.863120275409393]
We create a diverse resource of recent English neologisms by using several popular collection methods.
We analyze temporal drift using neologisms by comparing sentences containing new words with near-identical sentences that replace neologisms with existing substitute words.
Model performance is nearly halved in machine translation when a single neologism is introduced in a sentence.
arXiv Detail & Related papers (2024-02-19T16:19:15Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency.
We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes.
It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z) - The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust.
It asks necessary questions to decide when an LLM should refine its output.
It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z) - Large Language Models and Multimodal Retrieval for Visual Word Sense
Disambiguation [1.8591405259852054]
Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates.
In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches.
arXiv Detail & Related papers (2023-10-21T14:35:42Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.