Related papers: Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

URL: http://arxiv.org/abs/2512.03343v1
Date: Wed, 03 Dec 2025 01:17:07 GMT
Title: Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning
Authors: Darshan Fofadiya,
Abstract summary: We introduce the Idea-Gated Transformer, a novel architecture that separates semantic planning from syntactic generation.<n>We propose a differentiable gating mechanism that suppresses semantically irrelevant tokens, effectively pruning the search space in real-time.
Score: 0.40611352512781856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive Language Models (LLMs) trained on Next-Token Prediction (NTP) often suffer from ``Topic Drift'' where the generation wanders away from the initial prompt due to a reliance on local associations rather than global planning \citep{holtzman2019curious}. While scaling model size mitigates this \citep{brown2020language}, the fundamental myopia of the NTP objective remains. In this work, we introduce the Idea-Gated Transformer, a novel architecture that separates semantic planning from syntactic generation. We introduce an auxiliary ``Idea Head'' trained to predict the bag-of-words distribution for a future context window, creating a latent ``Concept Vector'' that actively gates the main vocabulary during generation. We propose a differentiable gating mechanism that suppresses semantically irrelevant tokens, effectively pruning the search space in real-time. Experiments on WikiText-103 demonstrate that while the Idea-Gated model achieves comparable validation perplexity to a standard GPT-2 baseline, it exhibits significantly superior Domain Retention. Qualitative and quantitative analysis reveals that the gating mechanism successfully locks generation into specific semantic clusters (e.g., Finance, Science) and resists associative drift, offering a parameter-efficient path toward more controllable language modeling.

Related papers

Modeling Language as a Sequence of Thoughts [0.0]
Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens.<n>Yet, they fail to form globally consistent latent representations of entities and events, lack of which contributes to brittleness in relational direction (e.g., reversal curse), contextualization errors, and data inefficiency.<n>We introduce Thought Gestalt (TG) model, a recurrent Transformer that models language at two levels of abstraction - tokens and sentence-level "thought" states.
arXiv Detail & Related papers (2025-12-31T18:24:57Z)
Next Interest Flow: A Generative Pre-training Paradigm for Recommender Systems by Modeling All-domain Movelines [8.895768051554162]
We propose a novel generative pre-training paradigm for e-commerce recommender systems.<n>Our model learns to predict the Next Interest Flow, a dense vector sequence representing a user's future intent.<n>We present the All-domain Moveline Evolution Network (AMEN), a unified framework implementing our entire pipeline.
arXiv Detail & Related papers (2025-10-13T12:13:17Z)
Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network.<n>In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z)
Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning [29.745218855471787]
Tokenization is a necessary component within the current architecture of many language mod-els.<n>We argue that tokenization is necessary for reasonably human-like language performance.<n>We discuss implications for architectural choices, meaning construction, the primacy of language for thought.
arXiv Detail & Related papers (2024-12-14T18:18:52Z)
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations [24.211603400355756]
Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models.<n>We look at how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations.<n>We validate our findings on synthetic and small-scale real language datasets.
arXiv Detail & Related papers (2024-08-27T21:46:47Z)
Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks [10.880057430629126]
Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. We introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties.
arXiv Detail & Related papers (2023-05-02T18:27:13Z)
Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.