Related papers: MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts

MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts

URL: http://arxiv.org/abs/2509.25020v1
Date: Mon, 29 Sep 2025 16:44:22 GMT
Title: MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts
Authors: Jiayu Liu, Zhenya Huang, Anya Sims, Enhong Chen, Yee Whye Teh, Ning Miao,
Abstract summary: We present a new paradigm for reasoning in large language models (LLMs)<n>Instead of autoregressively generating tokens, we model reasoning as a hidden Markov chain of continuous, high-dimensional "thoughts"<n>For the first time, MARCOS achieves performance comparable to token-based CoT, even surpassing it by 4.7% on GSM8K with up to 15.7x speedup in inference.
Score: 82.46857666702924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The current paradigm for reasoning in large language models (LLMs) involves models "thinking out loud" via a sequence of tokens, known as chain-of-thought (CoT). This approach, while effective, has several significant drawbacks. Firstly, inference requires autoregressive generation of often thousands of CoT tokens, which is slow and computationally expensive. Secondly, it constrains reasoning to the discrete space of tokens, creating an information bottleneck across reasoning steps. Thirdly, it fundamentally entangles reasoning with token generation, forcing LLMs to "think while speaking," which causes potentially short-sighted reasoning. In light of these limitations, we re-imagine reasoning in LLMs and present a new paradigm: MARCOS. In our approach, rather than autoregressively generating tokens, we model reasoning as a hidden Markov chain of continuous, high-dimensional "thoughts". Each reasoning step involves a transition of the internal thoughts, where explicit reasoning steps (which may consist of hundreds of tokens) serve as observable variables, which are windows to peek into the implicit thoughts. Since this latent process is incompatible with the standard supervised learning, we further propose a two-phase variational training scheme. Our experiments on three benchmarks demonstrate that MARCOS outperforms existing continuous reasoning methods and, for the first time, achieves performance comparable to token-based CoT, even surpassing it by 4.7% on GSM8K with up to 15.7x speedup in inference. Beyond this, MARCOS offers additional advantages, such as step-level instead of token-level control over randomness, opening significant opportunities for reinforcement learning and reasoning in LLMs.

Related papers

Latent Reasoning with Supervised Thinking States [60.09942890192309]
Reasoning with a chain-of-thought (CoT) enables Large Language Models (LLMs) to solve complex tasks but incurs significant inference costs.<n>We propose Thinking States, a method that performs reasoning em while the input is processing.<n>We show Thinking States leads to stronger reasoning behavior than CoT, successfully extrapolating to longer sequences than seen during training.
arXiv Detail & Related papers (2026-02-09T07:12:41Z)
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge [87.51901436392427]
Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT)<n>Humans, by contrast, often reason softly by maintaining a tractable probability distribution over plausible next steps.<n>We propose Multiplex Thinking, a soft reasoning mechanism that samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token.<n>Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT.
arXiv Detail & Related papers (2026-01-13T18:48:00Z)
Latent Reasoning in LLMs as a Vocabulary-Space Superposition [80.01651003144282]
Large language models (LLMs) demonstrate strong reasoning abilities with chain-of-thought prompting, but explicit reasoning introduces substantial computational overhead.<n>Recent work on latent reasoning reduces this cost by reasoning in latent space without explicit supervision, but performance drops significantly.<n>To address this, we restrict the latent space to the column space of the LLM vocabulary, treating latent reasoning as a superposition over vocabulary probabilities.<n>Once latent reasoning concludes, it collapses into an eigenstate of explicit reasoning to yield the final answer.<n>Latent-SFT sets a new state of the art on GSM8k, matching explicit
arXiv Detail & Related papers (2025-10-17T10:51:20Z)
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z)
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking [25.468889616586363]
We investigate the Soft Thinking capabilities of large language models (LLMs)<n>Contrary to the prevailing belief that Soft Thinking supports parallel exploration of diverse reasoning paths, our findings reveal that LLMs behave as single-threaded reasoners.<n>Our experiments demonstrate that randomness--particularly with the Gumbel-max trick--can alleviate the limitations of vanilla approaches.
arXiv Detail & Related papers (2025-08-05T13:38:33Z)
Improving Large Language Models with Concept-Aware Fine-Tuning [55.59287380665864]
Concept-Aware Fine-Tuning (CAFT) is a novel multi-token training method for large language models (LLMs)<n>CAFT enables the learning of sequences that span multiple tokens, fostering stronger concept-aware learning.<n>Experiments demonstrate significant improvements compared to conventional next-token finetuning methods.
arXiv Detail & Related papers (2025-06-09T14:55:00Z)
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space [62.54887038032942]
We introduce Soft Thinking, a training-free method that emulates human-like "soft" reasoning by generating soft, abstract concept tokens.<n>These concept tokens are created by the probability-weighted mixture of token embeddings, which form the continuous concept space.<n>In essence, each generated concept token encapsulates multiple meanings from related discrete tokens, implicitly exploring various reasoning paths to converge.
arXiv Detail & Related papers (2025-05-21T17:29:15Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [64.74765550805024]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 84% with minimal accuracy loss across 18 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
Training Large Language Models to Reason in a Continuous Latent Space [84.5618790930725]
We introduce a new paradigm Coconut (Chain of Continuous Thought) to explore the potential of large language models (LLMs) reasoning in an unrestricted latent space.<n>Experiments show that Coconut can effectively augment the LLM on several reasoning tasks.<n>These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.
arXiv Detail & Related papers (2024-12-09T18:55:56Z)
Markov Chain of Thought for Efficient Mathematical Reasoning [10.678633785012691]
Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions.<n>We conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT)<n>Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference.
arXiv Detail & Related papers (2024-10-23T07:53:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.