Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering
- URL: http://arxiv.org/abs/2512.18551v1
- Date: Sun, 21 Dec 2025 00:45:23 GMT
- Title: Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering
- Authors: Sungjoon Park, Varun Ramamurthi, Owen Terry,
- Abstract summary: Neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary.<n>We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning.<n>We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.
- Score: 1.4066253648292315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In language modeling, neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary. Neologisms can be used to encourage specific behavior in models, for example by appending prompts with "Give me a neologism answer." Behavioral steering can also be achieved through fine-tuning, albeit with more compute and less flexibility: learning a neologism only trains d parameters and allows the user to still access the model's default behavior. We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning, finding that neologisms outperform fine-tuned models under a matched training setup (same data and hyperparameters). We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.
Related papers
- Neologism Learning for Controllability and Self-Verbalization [23.932433693726182]
We explore the idea of introducing new words to better understand and control models.<n>This method introduces a new word by adding a new word embedding and training with examples that exhibit the concept.<n>We show that adding a new word allows for control of concepts such as flattery, incorrect answers, text length, as well as more complex concepts in AxBench.
arXiv Detail & Related papers (2025-10-09T17:41:57Z) - NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms [19.863120275409393]
We create a diverse resource of recent English neologisms by using several popular collection methods.
We analyze temporal drift using neologisms by comparing sentences containing new words with near-identical sentences that replace neologisms with existing substitute words.
Model performance is nearly halved in machine translation when a single neologism is introduced in a sentence.
arXiv Detail & Related papers (2024-02-19T16:19:15Z) - Generative Models as a Complex Systems Science: How can we make sense of
large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP.
We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - ALERT: Adapting Language Models to Reasoning Tasks [43.8679673685468]
ALERT is a benchmark and suite of analyses for assessing language models' reasoning ability.
ALERT provides a test bed to asses any language model on fine-grained reasoning skills.
We find that language models learn more reasoning skills during finetuning stage compared to pretraining state.
arXiv Detail & Related papers (2022-12-16T05:15:41Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Learning to Rationalize for Nonmonotonic Reasoning with Distant
Supervision [44.32874972577682]
We investigate the extent to which neural models can reason about natural language rationales that explain model predictions.
We use pre-trained language models, neural knowledge models, and distant supervision from related tasks.
Our model shows promises at generating post-hoc rationales explaining why an inference is more or less likely given the additional information.
arXiv Detail & Related papers (2020-12-14T23:50:20Z) - ABNIRML: Analyzing the Behavior of Neural IR Models [45.74073795558624]
Pretrained language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.
We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML)
We conduct an empirical study that yields insights into the factors that contribute to the neural model's gains.
arXiv Detail & Related papers (2020-11-02T03:07:38Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z) - What is Learned in Visually Grounded Neural Syntax Acquisition [118.6461386981381]
We consider the case study of the Visually Grounded Neural Syntax Learner.
By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance.
We find that a simple lexical signal of noun concreteness plays the main role in the model's predictions.
arXiv Detail & Related papers (2020-05-04T17:32:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.