Rapid Word Learning Through Meta In-Context Learning
- URL: http://arxiv.org/abs/2502.14791v1
- Date: Thu, 20 Feb 2025 18:11:38 GMT
- Title: Rapid Word Learning Through Meta In-Context Learning
- Authors: Wentao Wang, Guangyuan Jiang, Tal Linzen, Brenden M. Lake,
- Abstract summary: We introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow)
This method trains language models to generate new examples of a word's usage given a few in-context examples.
We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning.
- Score: 29.29775111160227
- License:
- Abstract: Humans can quickly learn a new word from a few illustrative examples, and then systematically and flexibly use it in novel contexts. Yet the abilities of current language models for few-shot word learning, and methods for improving these abilities, are underexplored. In this study, we introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow). This method trains language models to generate new examples of a word's usage given a few in-context examples, using a special placeholder token to represent the new word. This training is repeated on many new words to develop a general word-learning ability. We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning, comparable to a large language model (LLM) pre-trained on orders of magnitude more data. Furthermore, through discriminative and generative evaluations, we demonstrate that finetuning pre-trained LLMs with Minnow improves their ability to discriminate between new words, identify syntactic categories of new words, and generate reasonable new usages and definitions for new words, based on one or a few in-context examples. These findings highlight the data efficiency of Minnow and its potential to improve language model performance in word learning tasks.
Related papers
- A Distributional Perspective on Word Learning in Neural Language Models [57.41607944290822]
There are no widely agreed-upon metrics for word learning in language models.
We argue that distributional signatures studied in prior work fail to capture key distributional information.
We obtain learning trajectories for a selection of small language models we train from scratch.
arXiv Detail & Related papers (2025-02-09T13:15:59Z) - Exploring Automated Keyword Mnemonics Generation with Large Language Models via Overgenerate-and-Rank [4.383205675898942]
Keywords mnemonics are a technique for memorizing vocabulary through memorable associations with a target word via a verbal cue.
We propose a novel overgenerate-and-rank method via prompting large language models to generate verbal cues.
Results show that LLM-generated mnemonics are comparable to human-generated ones in terms of imageability, coherence, and perceived usefulness.
arXiv Detail & Related papers (2024-09-21T00:00:18Z) - Large Vocabulary Size Improves Large Language Models [28.83786065307658]
We investigate the relationship between subword vocabulary size and the performance of large language models (LLMs)
Experimental results show that larger vocabulary sizes lead to better performance in LLMs.
We introduce a simple method to use a new vocabulary instead of the pre-defined one.
arXiv Detail & Related papers (2024-06-24T10:27:07Z) - CoLLEGe: Concept Embedding Generation for Large Language Models [12.812113254812028]
CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts.
We design a series of tasks to test new concept learning in challenging real-world scenarios.
arXiv Detail & Related papers (2024-03-22T17:26:05Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.
We use a memory-enhanced ASR model from the literature to decode new words from the slides.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Evolving Dictionary Representation for Few-shot Class-incremental
Learning [34.887690018011675]
We tackle a challenging and practical continual learning scenario named few-shot class-incremental learning (FSCIL)
InFSCIL, labeled data are given for classes in a base session but very limited labeled instances are available for new incremental classes.
We propose deep dictionary learning which is a hybrid learning architecture that combines dictionary learning and visual representation learning.
arXiv Detail & Related papers (2023-05-03T04:30:34Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - MICE: Mining Idioms with Contextual Embeddings [0.0]
MICEatic expressions can be problematic for natural language processing applications.
We present an approach that uses contextual embeddings for that purpose.
We show that deep neural networks using both embeddings perform much better than existing approaches.
arXiv Detail & Related papers (2020-08-13T08:56:40Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.