Related papers: LLM Pretraining with Continuous Concepts

LLM Pretraining with Continuous Concepts

URL: http://arxiv.org/abs/2502.08524v1
Date: Wed, 12 Feb 2025 16:00:11 GMT
Title: LLM Pretraining with Continuous Concepts
Authors: Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li,
Abstract summary: Next token prediction has been the standard training objective used in large language model pretraining.<n>We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts.
Score: 71.98047075145249
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse autoencoder and mixes them into the model's hidden state by interleaving with token hidden representations. Through experiments on multiple benchmarks, including language modeling and downstream reasoning tasks, we show that CoCoMix is more sample efficient and consistently outperforms standard next token prediction, knowledge distillation and inserting pause tokens. We find that combining both concept learning and interleaving in an end-to-end framework is critical to performance gains. Furthermore, CoCoMix enhances interpretability and steerability by allowing direct inspection and modification of the predicted concept, offering a transparent way to guide the model's internal reasoning process.

Related papers

Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder. Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder. Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z)
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models. We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning. Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
Bayesian Concept Bottleneck Models with LLM Priors [9.368695619127084]
Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior.
arXiv Detail & Related papers (2024-10-21T01:00:33Z)
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency [2.7719338074999547]
Concept bottleneck models (CBMs) have emerged as critical tools in domains where interpretability is paramount. This study proposes underlinetextbfConceptual underlinetextbfLbedding via underlinetextbfEmbedding underlinetextbfApproximations for underlinetextbfReinforcing Interpretability and Transparency.
arXiv Detail & Related papers (2024-06-13T06:04:34Z)
Collaborative decoding of critical tokens for boosting factuality of large language models [57.504894664689]
Finetuned and aligned models show improved abilities of instruction following and safe generation. The common practice of using sampling during generation also increases chances of hallucination. We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
arXiv Detail & Related papers (2024-02-28T01:53:37Z)
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? [8.391254800873599]
We introduce a method to perform concept-based interventions on pretrained neural networks, which are not interpretable by design. We formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes.
arXiv Detail & Related papers (2024-01-24T16:02:14Z)
ConcEPT: Concept-Enhanced Pre-Training for Language Models [57.778895980999124]
ConcEPT aims to infuse conceptual knowledge into pre-trained language models. It exploits external entity concept prediction to predict the concepts of entities mentioned in the pre-training contexts. Results of experiments show that ConcEPT gains improved conceptual knowledge with concept-enhanced pre-training.
arXiv Detail & Related papers (2024-01-11T05:05:01Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
Federated Conformal Predictors for Distributed Uncertainty Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning. In this paper, we extend conformal prediction to the federated learning setting. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z)
Interactive Concept Bottleneck Models [14.240165842615674]
Concept bottleneck models (CBMs) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task. We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction.
arXiv Detail & Related papers (2022-12-14T11:39:18Z)
Efficient Self-Ensemble Framework for Semantic Segmentation [1.0819401241801994]
We propose to leverage the performance boost offered by ensemble methods to enhance semantic segmentation. Our self-ensemble framework takes advantage of the multi-scale features set produced by feature pyramid network methods. Our model can be trained end-to-end, alleviating the traditional cumbersome multi-stage training of ensembles.
arXiv Detail & Related papers (2021-11-26T00:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.