Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
- URL: http://arxiv.org/abs/2512.24617v2
- Date: Mon, 05 Jan 2026 05:44:29 GMT
- Title: Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
- Authors: Xingwei Qu, Shaowen Wang, Zihao Huang, Kai Hua, Fan Yin, Rui-Jie Zhu, Jundong Zhou, Qiyang Min, Zihao Wang, Yizhi Li, Tianyu Zhang, He Xing, Zheng Zhang, Yuxuan Song, Tianyu Zheng, Zhiyuan Zeng, Chenghua Lin, Ge Zhang, Wenhao Huang,
- Abstract summary: Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density.<n>We propose $textbfDynamic Large Concept Models (DLCM)$, a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts from tokens to a compressed concept space where reasoning is more efficient.
- Score: 56.37266873329401
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose $\textbf{Dynamic Large Concept Models (DLCM)}$, a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first $\textbf{compression-aware scaling law}$, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a $\textbf{decoupled $μ$P parametrization}$ that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting ($R=4$, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a $\textbf{+2.69$\%$ average improvement}$ across 12 zero-shot benchmarks under matched inference FLOPs.
Related papers
- ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation [12.503747711792679]
ConceptMoE dynamically merges semantically similar tokens into concept representations.<n>A learnable chunk module identifies optimal boundaries by measuring inter-token similarity.<n> ConceptMoE consistently outperforms standard MoE across language and vision-language tasks.
arXiv Detail & Related papers (2026-01-29T08:58:22Z) - Unified Scaling Laws for Compressed Representations [69.72517034565467]
We investigate whether a unified scaling framework can accurately predict model performance when training occurs over various compressed representations.<n>Our main finding is demonstrating both theoretically and empirically that there exists a simple "capacity" metric.<n>We extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
arXiv Detail & Related papers (2025-06-02T16:52:51Z) - Bound by semanticity: universal laws governing the generalization-identification tradeoff [8.437463955457423]
We show that finite-resolution similarity is a fundamental emergent informational constraint, not merely a toy-model artifact.<n>These results provide an exact theory of the generalization-identification trade-off and clarify how semantic resolution shapes the representational capacity of deep networks and brains alike.
arXiv Detail & Related papers (2025-06-01T15:56:26Z) - Compression Hacking: A Supplementary Perspective on Informatics Properties of Language Models from Geometric Distortion [56.12939353271623]
From a geometric standpoint, the word representation space of highly compressed LMs tends to degenerate into a highly anisotropic state.<n>We find this synchronicity is essentially the Compression Hacking'' in LM representations.<n>We propose three refined compression metrics by incorporating geometric distortion analysis and integrate them into a self-evaluation pipeline.
arXiv Detail & Related papers (2025-05-23T12:11:03Z) - Saliency-driven Dynamic Token Pruning for Large Language Models [32.903622070917194]
Saliency-driven Dynamic Token Pruning (SDTP)<n>A lightweight saliency-driven prediction module is designed to estimate the importance score of each token with its hidden state.<n>A ranking-based optimization strategy is proposed to minimize the ranking divergence of the saliency score and the predicted importance score.
arXiv Detail & Related papers (2025-04-06T15:15:07Z) - UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model [62.66515621965686]
We introduce a novel theoretical framework with a Dual Discrete Diffusion (D3Diff) loss, unifying masked generative models with discrete score matching diffusion.<n>This D3Diff significantly enhances the model's ability to synthesize high-fidelity facial details aligned with text input.<n>We construct UniF$2$aceD-1M, a large-scale dataset comprising 130K fine-grained image-caption pairs and 1M visual question-answering pairs.
arXiv Detail & Related papers (2025-03-11T07:34:59Z) - Scaling Embedding Layers in Language Models [61.939921364422936]
$SCONE$ is a new method for extending input embedding layers to enhance language model performance.<n>$SCONE$ retains the original vocabulary while introducing embeddings for a set of frequent n-grams.<n>These embeddings provide contextualized representation for each input token and are learned with a separate model during training.<n>$SCONE$ enables two new scaling strategies: increasing the number of n-gram embeddings and scaling the model used to learn them, both while maintaining fixed accelerator usage during inference.
arXiv Detail & Related papers (2025-02-03T18:59:32Z) - Mixture-of-Depths: Dynamically allocating compute in transformer-based language models [8.774705201394916]
Transformer-based language models spread FLOPs uniformly across input sequences.
We show that transformers can learn to dynamically allocate FLOPs to specific positions in a sequence.
arXiv Detail & Related papers (2024-04-02T19:28:11Z) - Densely Connected $G$-invariant Deep Neural Networks with Signed
Permutation Representations [6.200483285433661]
We introduce and investigate, for finite groups $G$, $G$-invariant deep neural network ($G$-DNN) architectures with ReLU activation.
The preactivations of the $G$-DNNs are able to transform by emphsigned permutation representations (signed perm-reps) of $G$.
We show that there are far more admissible $G$-DNN architectures than those accessible with the concatenated ReLU'' activation function from the literature.
arXiv Detail & Related papers (2023-03-08T14:35:03Z) - Unsupervised Semantic Segmentation by Distilling Feature Correspondences [94.73675308961944]
Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation.
We present STEGO, a novel framework that distills unsupervised features into high-quality discrete semantic labels.
STEGO yields a significant improvement over the prior state of the art, on both the CocoStuff and Cityscapes challenges.
arXiv Detail & Related papers (2022-03-16T06:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.