Related papers: Neologism Learning for Controllability and Self-Verbalization

Neologism Learning for Controllability and Self-Verbalization

URL: http://arxiv.org/abs/2510.08506v1
Date: Thu, 09 Oct 2025 17:41:57 GMT
Title: Neologism Learning for Controllability and Self-Verbalization
Authors: John Hewitt, Oyvind Tafjord, Robert Geirhos, Been Kim,
Abstract summary: We explore the idea of introducing new words to better understand and control models.<n>This method introduces a new word by adding a new word embedding and training with examples that exhibit the concept.<n>We show that adding a new word allows for control of concepts such as flattery, incorrect answers, text length, as well as more complex concepts in AxBench.
Score: 23.932433693726182
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans invent new words when there is a rising demand for a new useful concept (e.g., doomscrolling). We explore and validate a similar idea in our communication with LLMs: introducing new words to better understand and control the models, expanding on the recently introduced neologism learning. This method introduces a new word by adding a new word embedding and training with examples that exhibit the concept with no other changes in model parameters. We show that adding a new word allows for control of concepts such as flattery, incorrect answers, text length, as well as more complex concepts in AxBench. We discover that neologisms can also further our understanding of the model via self-verbalization: models can describe what each new word means to them in natural language, like explaining that a word that represents a concept of incorrect answers means ``a lack of complete, coherent, or meaningful answers...'' To validate self-verbalizations, we introduce plug-in evaluation: we insert the verbalization into the context of a model and measure whether it controls the target concept. In some self-verbalizations, we find machine-only synonyms: words that seem unrelated to humans but cause similar behavior in machines. Finally, we show how neologism learning can jointly learn multiple concepts in multiple words.

Related papers

Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering [1.4066253648292315]
Neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary.<n>We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning.<n>We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.
arXiv Detail & Related papers (2025-12-21T00:45:23Z)
Rapid Word Learning Through Meta In-Context Learning [20.370397855670124]
We introduce a novel method, Meta-training for IN-context learNing Of Words (Minnow)<n>This method trains language models to generate new examples of a word's usage given a few in-context examples.<n>We find that training models from scratch with Minnow on human-scale child-directed language enables strong few-shot word learning.
arXiv Detail & Related papers (2025-02-20T18:11:38Z)
CoLLEGe: Concept Embedding Generation for Large Language Models [12.812113254812028]
CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts. We design a series of tasks to test new concept learning in challenging real-world scenarios.
arXiv Detail & Related papers (2024-03-22T17:26:05Z)
Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.<n>We use a memory-enhanced ASR model from the literature to decode new words from the slides.<n>We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
Word sense extension [8.939269057094661]
We present a paradigm of word sense extension (WSE) that enables words to spawn new senses toward novel context. We develop a framework that simulates novel word sense extension by partitioning a polysemous word type into two pseudo-tokens that mark its different senses. Our framework combines cognitive models of chaining with a learning scheme that transforms a language model embedding space to support various types of word sense extension.
arXiv Detail & Related papers (2023-06-09T00:54:21Z)
DreamArtist++: Controllable One-Shot Text-to-Image Generation via Positive-Negative Adapter [63.622879199281705]
Some example-based image generation approaches have been proposed, emphi.e. generating new concepts based on absorbing the salient features of a few input references.<n>We propose a simple yet effective framework, namely DreamArtist, which adopts a novel positive-negative prompt-tuning learning strategy on the pre-trained diffusion model.<n>We have conducted extensive experiments and evaluated the proposed method from image similarity (fidelity) and diversity, generation controllability, and style cloning.
arXiv Detail & Related papers (2022-11-21T10:37:56Z)
FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z)
Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly. We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z)
My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples. We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms. We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z)
Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning [78.13740873213223]
Bongard problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. We propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning.
arXiv Detail & Related papers (2020-10-02T03:19:46Z)
Word meaning in minds and machines [18.528929583956725]
We argue that contemporary NLP systems are fairly successful models of human word similarity, but they fall short in many other respects. Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people express through words. We discuss more promising approaches to grounding NLP systems and argue that they will be more successful with a more human-like, conceptual basis for word meaning.
arXiv Detail & Related papers (2020-08-04T18:45:49Z)
Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches. We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.