Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
- URL: http://arxiv.org/abs/2512.10931v1
- Date: Thu, 11 Dec 2025 18:57:02 GMT
- Title: Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
- Authors: George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi, Vyacheslav Zhdanovskiy, Alina Shutova, Denis Kuznedelev,
- Abstract summary: Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive.<n>We use the properties of rotary embeddings to enable LLMs built for sequential interactions to simultaneously think, listen, and generate outputs.<n>We evaluate our approach on math, commonsense, and safety reasoning and find that it can generate accurate thinking-augmented answers in real time.
- Score: 5.751951973255713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities and safety, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embedded assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of rotary embeddings to enable LLMs built for sequential interactions to simultaneously think, listen, and generate outputs. We evaluate our approach on math, commonsense, and safety reasoning and find that it can generate accurate thinking-augmented answers in real time, reducing time to first non-thinking token from minutes to <= 5s. and the overall real-time delays by 6-11x.
Related papers
- Large Language Models as Students Who Think Aloud: Overly Coherent, Verbose, and Confident [0.8564319625930894]
Large language models (LLMs) are increasingly embedded in AI-based tutoring systems. Can they faithfully model novice reasoning and metacognitive judgments?<n>We evaluate LLMs as novices using 630 think-aloud utterances from chemistry tutoring problems with problem-solving logs of student hint use, attempts, and problem context.<n>We compare LLM-generated reasoning to human learner utterances under minimal and extended contextual prompting, and assess the models' ability to predict step-level learner success.
arXiv Detail & Related papers (2026-02-01T04:46:38Z) - Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech [41.625380059502675]
Think-Verbalize-Speak is a framework that decouples reasoning from spoken delivery.<n>We also introduce ReVerT, a latency-efficient verbalizer based on incremental and asynchronous summarization.<n> Experiments across multiple benchmarks show that our method enhances speech naturalness and conciseness with minimal impact on reasoning.
arXiv Detail & Related papers (2025-09-19T14:34:22Z) - STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models [131.90117151306993]
Spoken Language Models (SLMs) are designed to take speech inputs and produce spoken responses.<n>Current SLMs lack the ability to perform an internal, unspoken thinking process before responding.<n>We propose Stitch, a novel generation method that alternates between the generation of unspoken reasoning chunks and spoken response chunks.
arXiv Detail & Related papers (2025-07-21T08:30:03Z) - Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs [52.663816303997194]
A key factor influencing answer quality is the length of the thinking stage.<n>This paper explores and exploits the mechanisms by which LLMs understand and regulate the length of their reasoning.<n>Our results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency.
arXiv Detail & Related papers (2025-06-08T17:54:33Z) - Discrete Minds in a Continuous World: Do Language Models Know Time Passes? [44.46759661130471]
Large Language Models (LLMs) excel at temporal reasoning tasks like event ordering and duration estimation.<n>We investigate whether LLMs perceive the passage of time and adapt their decision-making accordingly.
arXiv Detail & Related papers (2025-06-06T06:37:01Z) - Thinkless: LLM Learns When to Think [57.857534644932194]
Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.<n>We propose Thinkless, a learnable framework that empowers an LLM to adaptively select between short-form and long-form reasoning.<n>On several benchmarks such as Minerva Algebra, MATH-500, and GSM8K, Thinkless is able to reduce the usage of long-chain thinking by 50% - 90%.
arXiv Detail & Related papers (2025-05-19T17:24:16Z) - Retrospective Learning from Interactions [18.5871047885934]
We introduce ReSpect, a method to learn from such signals in past interactions via retrospection without additional annotations.<n>We show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation.
arXiv Detail & Related papers (2024-10-17T17:59:03Z) - Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? [70.19200858203388]
Temporal reasoning is fundamental for large language models to comprehend the world.
CoTempQA is a benchmark containing four co-temporal scenarios.
Our experiments reveal a significant gap between the performance of current LLMs and human-level reasoning.
arXiv Detail & Related papers (2024-06-13T12:56:21Z) - Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans.
RaR serves as a simple yet effective prompting method for improving performance.
We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z) - Rethinking with Retrieval: Faithful Large Language Model Inference [91.66406351103484]
We propose a novel post-processing approach, rethinking with retrieval (RR)
RR retrieves relevant external knowledge based on the reasoning steps obtained from the chain-of-thought prompting.
We evaluate the effectiveness of RR through extensive experiments with GPT-3 on three complex reasoning tasks.
arXiv Detail & Related papers (2022-12-31T22:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.