Fugu-MT 論文翻訳(概要): StreamingThinker: Large Language Models Can Think While Reading

論文の概要: StreamingThinker: Large Language Models Can Think While Reading

arxiv url: http://arxiv.org/abs/2510.17238v1
Date: Mon, 20 Oct 2025 07:27:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.022861
Title: StreamingThinker: Large Language Models Can Think While Reading
Title（参考訳）: StreamingThinker: 読みながら考えることのできる大きな言語モデル
Authors: Junlong Tong, Yingqi Fan, Anhao Zhao, Yunpu Ma, Xiaoyu Shen,
Abstract要約: 大規模言語モデル(LLM)は思考の連鎖(CoT)推論において顕著な能力を示した。読みながら考えることの人間の認知に触発され,まず LLM のためのテキストテキストbfstreaming 思考パラダイムを設計する。このパラダイムをTextitStreamingThinkerでインスタンス化します。
参考スコア（独自算出の注目度）: 14.54868327561777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in chain of thought (CoT) reasoning. However, the current LLM reasoning paradigm initiates thinking only after the entire input is available, which introduces unnecessary latency and weakens attention to earlier information in dynamic scenarios. Inspired by human cognition of thinking while reading, we first design a \textit{\textbf{streaming thinking}} paradigm for LLMs, where reasoning unfolds in the order of input and further adjusts its depth once reading is complete. We instantiate this paradigm with \textit{StreamingThinker}, a framework that enables LLMs to think while reading through the integration of streaming CoT generation, streaming-constraint training, and streaming parallel inference. Specifically, StreamingThinker employs streaming reasoning units with quality control for CoT generation, enforces order-preserving reasoning through streaming attention masks and position encoding, and leverages parallel KV caches that decouple input encoding from reasoning generation, thereby ensuring alignment and enabling true concurrency. We evaluate StreamingThinker on the Qwen3 model family across math reasoning, logical reasoning, and context-based QA reasoning tasks. Experimental results show that the StreamingThinker preserves performance comparable to batch thinking, while yielding an 80\% reduction in token waiting before the onset of reasoning and a more than 60\% reduction in time-level latency for producing the final answer, demonstrating the effectiveness of the streaming paradigm for LLM reasoning. Code will be released at \href{https://github.com/EIT-NLP/StreamingLLM/tree/main/StreamingThinker}{this repository.}
Abstract（参考訳）: 大規模言語モデル(LLM)は思考の連鎖(CoT)推論において顕著な能力を示した。しかし、現在のLLM推論パラダイムは、入力全体が利用可能になった後にのみ思考を開始するため、不要な遅延が発生し、動的シナリオにおける以前の情報への注意が弱まる。読みながら考えることの人間の認知に触発されて、まず LLM のための \textit{\textbf{streaming Think}} パラダイムを設計し、読み上げが完了すると、推論が入力順に展開し、その深さを調整します。このパラダイムを,ストリーミングCoT生成,ストリーミング制約トレーニング,ストリーミング並列推論の統合を通じて,LLMの思考を可能にするフレームワークであるtextit{StreamingThinker}でインスタンス化する。具体的には、StreamingThinkerは、CoT生成の品質制御を備えたストリーミング推論ユニットを採用し、ストリーミングアテンションマスクや位置エンコーディングによる順序保存推論を実行し、並列KVキャッシュを活用して、入力エンコーディングを推論生成から切り離し、アライメントを確保し、真の並行性を実現する。本稿では,Qwen3モデルファミリ上のStreamingThinkerを,数理推論,論理推論,コンテキストベースのQA推論タスクで評価する。実験結果から,StreamingThinkerはバッチ思考に匹敵する性能を保ちつつ,推理開始前にトークンを80%以上削減し,最終回答を生成するための時間レベルの遅延を60%以上低減し,LLM推論におけるストリーミングパラダイムの有効性を実証した。コードは \href{https://github.com/EIT-NLP/StreamingLLM/tree/main/StreamingThinker}{this リポジトリでリリースされる。 ※

論文の概要: StreamingThinker: Large Language Models Can Think While Reading

関連論文リスト