Plantain: Plan-Answer Interleaved Reasoning
- URL: http://arxiv.org/abs/2512.03176v1
- Date: Tue, 02 Dec 2025 19:22:12 GMT
- Title: Plantain: Plan-Answer Interleaved Reasoning
- Authors: Anthony Liang, Jonathan Berant, Adam Fisch, Abhimanyu Goyal, Kalpesh Krishna, Jacob Eisenstein,
- Abstract summary: Reasoning models often spend a significant amount of time thinking before they generate a visible response.<n>We propose interleaved reasoning, in which the model alternates between thinking and surfacing intermediate responses.<n>Plantain is where the first intermediate response is an explicit, step-by-step plan for executing the task.
- Score: 38.046123106961176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted while the model reasons from a false premise that could have easily been corrected. In contrast, human speakers typically perform lightweight, incremental grounding acts to ensure that participants in the conversation are on the same page; here we ask if language models can learn to leverage a similar type of behavior? With this motivation, we propose interleaved reasoning (IR), in which the model alternates between thinking and surfacing intermediate responses, as an alternative to the standard "think-then-answer" approach. By providing useful information to the user earlier, IR reduces perceived latency, the time a user waits for an initial output, without compromising the quality of the final response. We further introduce a specialization of interleaved reasoning, Plantain (Plan-Thought-Answer Interleaving), where the first intermediate response is an explicit, step-by-step plan for executing the task. This plan-first strategy allows for user intervention and early feedback for subsequent reasoning steps. We demonstrate that Plantain yields an ~6% improvement in pass@1 across several challenging math reasoning and coding benchmarks, while reducing time-to-first-response by over 60% relative to think-then-answer baselines.
Related papers
- Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning [37.40951956513094]
Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation.<n>Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR)<n>PIR transforms LRMs'reasoning paradigm from exhaustive self-exploration to guided learning from precedents.
arXiv Detail & Related papers (2026-02-16T04:17:46Z) - Beyond Model Scaling: Test-Time Intervention for Efficient Deep Reasoning [34.912727372324625]
Think-with-Me is a test-time interactive reasoning paradigm that introduces external feedback intervention into the reasoning process.<n>Think-with-Me pauses reasoning at points for external feedback, adaptively extending or terminating reasoning to reduce redundancy while preserving accuracy.<n>Experiments show that Think-with-Me achieves a superior balance between accuracy and reasoning length under limited context windows.
arXiv Detail & Related papers (2026-01-16T13:00:42Z) - VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice [88.93674345138054]
Chain-of-thought (CoT) reasoning has emerged as a powerful tool for multimodal large language models on video understanding tasks.<n>We propose VideoAuto-R1, a video understanding framework that adopts a reason-when-necessary strategy.
arXiv Detail & Related papers (2026-01-08T18:00:59Z) - Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models [33.398631680508814]
We propose Answer-Consistent Reinforcement Learning that modifies the GRPO algorithm with an auxiliary consistency check.<n>We design a consistency-verification reward that grants a high reward only if both the original and the post-shuffle answers agree and are correct.<n>We evaluate ACRE on challenging Video Reasoning benchmarks and multimodal math reasoning benchmarks, achieving an average 2.2% and 1.5% improvement.
arXiv Detail & Related papers (2025-10-11T08:32:52Z) - Thinking Before You Speak: A Proactive Test-time Scaling Approach [54.8205006555199]
We implement our idea as a reasoning framework, named emphThinking Before You Speak (TBYS)<n>We design a pipeline for automatically collecting and filtering in-context examples for the generation of emphinsights.<n>Experiments on challenging mathematical datasets verify the effectiveness of TBYS.
arXiv Detail & Related papers (2025-08-26T03:43:32Z) - Stop Spinning Wheels: Mitigating LLM Overthinking via Mining Patterns for Early Reasoning Exit [114.83867400179354]
Overthinking can degrade overall performance of large language models.<n>We categorize reasoning into three stages: insufficient exploration stage, compensatory reasoning stage, and reasoning convergence stage.<n>We develop a lightweight thresholding strategy based on rules to improve reasoning accuracy.
arXiv Detail & Related papers (2025-08-25T03:17:17Z) - Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models [54.85405423240165]
We introduce Interactive Reasoning, an interaction design that visualizes chain-of-thought outputs as a hierarchy of topics.<n>We implement interactive reasoning in Hippo, a prototype for AI-assisted decision making in the face of uncertain trade-offs.
arXiv Detail & Related papers (2025-06-30T10:00:43Z) - Real-Time Progress Prediction in Reasoning Language Models [41.08450684104994]
In this work, we investigate whether real-time progress prediction is feasible.<n>We discretize progress and train a linear probe to classify reasoning states.<n>We then introduce a two-stage fine-tuning approach that enables reasoning models to generate progress estimates.
arXiv Detail & Related papers (2025-06-29T15:01:01Z) - Answer Convergence as a Signal for Early Stopping in Reasoning [7.51755942515969]
Chain-of-thought (CoT) prompting enhances reasoning in large language models (LLMs)<n>We propose three inference-time strategies to improve efficiency: (1) early stopping via answer consistency, (2) boosting the probability of generating end-of-reasoning signals, and (3) a supervised method that learns when to stop based on internal activations.
arXiv Detail & Related papers (2025-06-03T07:20:54Z) - What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception [53.4840989321394]
We analyze the effect of rationales generated by QA models to support their answers.
We present users with incorrect answers and corresponding rationales in various formats.
We measure the effectiveness of this feedback in patching these rationales through in-context learning.
arXiv Detail & Related papers (2023-11-16T04:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.