Related papers: CoLLEGe: Concept Embedding Generation for Large Language Models

CoLLEGe: Concept Embedding Generation for Large Language Models

URL: http://arxiv.org/abs/2403.15362v2
Date: Wed, 16 Oct 2024 19:57:08 GMT
Title: CoLLEGe: Concept Embedding Generation for Large Language Models
Authors: Ryan Teehan, Brenden Lake, Mengye Ren,
Abstract summary: CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts. We design a series of tasks to test new concept learning in challenging real-world scenarios.
Score: 12.812113254812028
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language models. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a language model to make next word predictions in forthcoming sentences, making it compatible with language model pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training. Code and data for our project can be found at https://college-concept-learning.github.io/

Related papers

Neologism Learning for Controllability and Self-Verbalization [23.932433693726182]
We explore the idea of introducing new words to better understand and control models.<n>This method introduces a new word by adding a new word embedding and training with examples that exhibit the concept.<n>We show that adding a new word allows for control of concepts such as flattery, incorrect answers, text length, as well as more complex concepts in AxBench.
arXiv Detail & Related papers (2025-10-09T17:41:57Z)
Rapid Word Learning Through Meta In-Context Learning [29.29775111160227]
We introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow) This method trains language models to generate new examples of a word's usage given a few in-context examples. We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning.
arXiv Detail & Related papers (2025-02-20T18:11:38Z)
Large Concept Models: Language Modeling in a Sentence Representation Space [62.73366944266477]
We present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. We show that our model exhibits impressive zero-shot generalization performance to many languages.
arXiv Detail & Related papers (2024-12-11T23:36:20Z)
Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings. We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z)
FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models? [14.582209994281374]
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. We propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning.
arXiv Detail & Related papers (2023-07-09T08:07:43Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning [38.37682598345653]
We introduce a multimodal meta-learning approach to bridge the gap between vision and language models. We define a meta-mapper network, acting as a meta-learner, to efficiently bridge frozen large-scale vision and language models. We evaluate our approach on recently proposed multimodal few-shot benchmarks, measuring how rapidly the model can bind novel visual concepts to words.
arXiv Detail & Related papers (2023-02-28T17:46:18Z)
Efficient Induction of Language Models Via Probabilistic Concept Formation [13.632454840363916]
We present a novel approach to the acquisition of language models from corpora. The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of probabilistic concepts. We explore three new extensions to Cobweb -- the Word, Leaf, and Path variants.
arXiv Detail & Related papers (2022-12-22T18:16:58Z)
ConceptX: A Framework for Latent Concept Analysis [21.760620298330235]
We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in Language Models (pLMs) We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts.
arXiv Detail & Related papers (2022-11-12T11:31:09Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings [2.8478710949588284]
CoLLIE is a model for continual learning of how language is grounded in vision. It learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. We show that CoLLIE can efficiently learn and generalize from only a few examples.
arXiv Detail & Related papers (2021-11-15T18:54:58Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training. We propose a new pre-training task based on contrastive learning. By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.