CoLLEGe: Concept Embedding Generation for Large Language Models
- URL: http://arxiv.org/abs/2403.15362v1
- Date: Fri, 22 Mar 2024 17:26:05 GMT
- Title: CoLLEGe: Concept Embedding Generation for Large Language Models
- Authors: Ryan Teehan, Brenden Lake, Mengye Ren,
- Abstract summary: CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts.
We design a series of tasks to test new concept learning in challenging real-world scenarios.
- Score: 12.812113254812028
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language models. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a language model to make next word predictions in forthcoming sentences, making it compatible with language model pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training.
Related papers
- SLANG: New Concept Comprehension of Large Language Models [46.65436204783482]
Large language models (LLMs) often struggle to keep up with the rapid linguistic evolution characteristic of online communities.
Our benchmark and approach involves understanding real-world instances of linguistic shifts, serving as contextual beacons.
Our causal inference-based approach outperforms the baseline methods in terms of precision and relevance in the comprehension of Internet slang and memes.
arXiv Detail & Related papers (2024-01-23T09:33:31Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Evolving Dictionary Representation for Few-shot Class-incremental
Learning [34.887690018011675]
We tackle a challenging and practical continual learning scenario named few-shot class-incremental learning (FSCIL)
InFSCIL, labeled data are given for classes in a base session but very limited labeled instances are available for new incremental classes.
We propose deep dictionary learning which is a hybrid learning architecture that combines dictionary learning and visual representation learning.
arXiv Detail & Related papers (2023-05-03T04:30:34Z) - ConceptX: A Framework for Latent Concept Analysis [21.760620298330235]
We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in Language Models (pLMs)
We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts.
arXiv Detail & Related papers (2022-11-12T11:31:09Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - CoLLIE: Continual Learning of Language Grounding from Language-Image
Embeddings [2.8478710949588284]
CoLLIE is a model for continual learning of how language is grounded in vision.
It learns a transformation function that adjusts the language embeddings when needed to accommodate new language use.
We show that CoLLIE can efficiently learn and generalize from only a few examples.
arXiv Detail & Related papers (2021-11-15T18:54:58Z) - Distilling Linguistic Context for Language Model Compression [27.538080564616703]
A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning.
We present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships.
We validate the effectiveness of our method on challenging benchmarks of language understanding tasks.
arXiv Detail & Related papers (2021-09-17T05:51:45Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.