Related papers: Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

URL: http://arxiv.org/abs/2502.03275v1
Date: Wed, 05 Feb 2025 15:33:00 GMT
Title: Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Authors: DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng,
Abstract summary: Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data.<n>We propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens.
Score: 44.84219266082269
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words support textual coherence rather than core reasoning information, and processing these inputs consumes substantial computation resources. In this work, we propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens generated by VQ-VAE, significantly reducing the length of reasoning traces. We explore the use of latent trace abstractions in two scenarios: 1) training the model from scratch for the Keys-Finding Maze problem, 2) fine-tuning LLMs on this hybrid data with an extended vocabulary including unseen latent tokens, for both logical and mathematical reasoning problems. To facilitate effective learning, we introduce a simple training procedure that randomly mixes latent and text tokens, which enables fast adaptation to new latent tokens. Our approach consistently outperforms the baselines methods in various benchmarks.

Related papers

Multipole Attention for Efficient Long Context Reasoning [64.94673641704289]
Large Reasoning Models (LRMs) have shown promising accuracy improvements on complex problem-solving tasks.<n>LRMs need to generate long chain-of-thought reasoning in order to think before answering.<n>We introduce Multipole Attention, which accelerates autoregressive reasoning by only computing exact attention for the most important tokens.
arXiv Detail & Related papers (2025-06-16T03:00:40Z)
Let's Predict Sentence by Sentence [31.702157701240967]
We present a framework that adapts a pretrained token-level LM to operate in sentence space by autoregressively predicting continuous embeddings of next sentences.<n>Our results indicate that pretrained LMs can effectively transition to abstract, structured reasoning within latent embedding spaces.
arXiv Detail & Related papers (2025-05-28T10:28:35Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
We introduce Sketch-of-Thought (SoT), a novel prompting framework. It combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage. SoT achieves token reductions of 76% with negligible accuracy impact.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
Self-Training Elicits Concise Reasoning in Large Language Models [23.475414693530965]
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens. We propose simple fine-tuning methods which leverage self-generated concise reasoning paths. Our method achieves a 30% reduction in output tokens, across five model families on GSM8K and MATH, while maintaining average accuracy.
arXiv Detail & Related papers (2025-02-27T14:14:50Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.<n>We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.<n>Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z)
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables [56.73919743039263]
This paper introduces a novel algorithm that integrates both symbolic and semantic (textual) approaches in a two-stage process to address limitations. Our experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three question-answering (QA) and fact-verification datasets.
arXiv Detail & Related papers (2024-06-29T21:24:19Z)
Tokenization Is More Than Compression [14.939912120571728]
Existing tokenization approaches like Byte-Pair. (BPE) originate from the field of data compression. We introduce PathPiece, a new tokenizer that segments a document's text into the minimum number of tokens for a given vocabulary.
arXiv Detail & Related papers (2024-02-28T14:52:15Z)
Identifying and Analyzing Task-Encoding Tokens in Large Language Models [55.03191279766383]
In this paper, we identify and analyze task-encoding tokens on whose representations the task performance depends. We show that template and stopword tokens are the most prone to be task-encoding. Our work sheds light on how large language models (LLMs) learn to perform a task from demonstrations, deepens our understanding of the varied roles different types of tokens play in LLMs, and provides insights for avoiding instability from improperly utilizing task-encoding tokens.
arXiv Detail & Related papers (2024-01-20T20:55:21Z)
LabelPrompt: Effective Prompt-based Learning for Relation Classification [31.291466190218912]
This paper presents a novel prompt-based learning method, namely LabelPrompt, for the relation classification task. Motivated by the intuition to GIVE MODEL CHOICES!'', we first define additional tokens to represent relation labels, which regard these tokens as the verbaliser with semantic initialisation. Then, to mitigate inconsistency between predicted relations and given entities, we implement an entity-aware module with contrastive learning.
arXiv Detail & Related papers (2023-02-16T04:06:25Z)
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning. We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z)
Learning to Ask Conversational Questions by Optimizing Levenshtein Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions. RISE is able to pay attention to tokens that are related to conversational characteristics. Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.