Related papers: Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs

Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs

URL: http://arxiv.org/abs/2505.14530v1
Date: Tue, 20 May 2025 15:49:15 GMT
Title: Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
Authors: Zhipeng Yang, Junzhuo Li, Siyu Xia, Xuming Hu,
Abstract summary: Large language models (LLMs) sequentially decompose and execute composite tasks layer-by-layer.<n>Two claims ground our study: (i) distinct subtasks are learned at different network depths, and (ii) these subtasks are executed sequentially across layers.
Score: 20.139581575671436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that large language models (LLMs) exhibit an $\textit{internal chain-of-thought}$: they sequentially decompose and execute composite tasks layer-by-layer. Two claims ground our study: (i) distinct subtasks are learned at different network depths, and (ii) these subtasks are executed sequentially across layers. On a benchmark of 15 two-step composite tasks, we employ layer-from context-masking and propose a novel cross-task patching method, confirming (i). To examine claim (ii), we apply LogitLens to decode hidden states, revealing a consistent layerwise execution pattern. We further replicate our analysis on the real-world $\text{TRACE}$ benchmark, observing the same stepwise dynamics. Together, our results enhance LLMs transparency by showing their capacity to internally plan and execute subtasks (or instructions), opening avenues for fine-grained, instruction-level activation steering.

Related papers

Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications [42.945657927971]
This paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks.<n>To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs.<n>Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear.
arXiv Detail & Related papers (2025-07-28T09:33:12Z)
PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries [16.40921376558516]
We introduce PARALLELPROMPT, the first benchmark for measuring intra-query parallelism in natural user prompts.<n>Our dataset comprises over 37,000 real-world prompts from public LLM chat logs.<n>We provide an execution suite that benchmarks serial vs. parallel strategies, measuring latency, structural adherence, and semantic fidelity.
arXiv Detail & Related papers (2025-06-23T15:05:54Z)
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration [11.724886737930671]
Multimodal in-context learning (ICL) has emerged as a key mechanism for harnessing the capabilities of large vision-language models (LVLMs)<n>We present TACO, a lightweight transformer-based model equipped with task-aware attention that dynamically configures in-context sequences.<n>Experiments on five LVLMs and nine datasets demonstrate that TACO consistently surpasses baselines across diverse ICL tasks.
arXiv Detail & Related papers (2025-05-21T05:22:21Z)
Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models [93.5327725085853]
Continual LLaVA is a rehearsal-free method tailored for continual instruction tuning in LVLMs. Experiments indicate that the proposed Continual LLaVA outperforms previous methods by significantly reducing the forgetting during the continual instruction tuning process.
arXiv Detail & Related papers (2024-11-04T19:55:32Z)
Fine-tuning Large Language Models with Sequential Instructions [2.546845645875049]
We find that existing instruction-tuned models struggle to respond to queries with multiple instructions. We contend that part of the fine-tuning data mixture should be sequential--containing a chain of interrelated tasks. We automate this process by turning instructions in existing datasets into diverse and complex sequential instructions. Models that underwent our sequential instruction tuning show improved results in coding, maths, and open-ended generation.
arXiv Detail & Related papers (2024-03-12T16:33:30Z)
SMAUG: A Sliding Multidimensional Task Window-Based MARL Framework for Adaptive Real-Time Subtask Recognition [11.236363226878975]
Subtask-based multi-agent reinforcement learning (MARL) methods enable agents to learn how to tackle different subtasks. textbfSliding textbfMultidimensional ttextbfAsk window based mtextbfUti-agent reinforcement learnintextbfG framework (SMAUG) is proposed for adaptive real-time subtask recognition. Experiments on StarCraft II show that SMAUG not only demonstrates performance superiority in comparison with all baselines but also presents a more prominent and swift rise in rewards
arXiv Detail & Related papers (2024-03-04T08:04:41Z)
ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT) ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them. Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z)
Robust Subtask Learning for Compositional Generalization [20.54144051436337]
We focus on the problem of training subtask policies in a way that they can be used to perform any task. We aim to maximize the worst-case performance over all tasks as opposed to the average-case performance.
arXiv Detail & Related papers (2023-02-06T18:19:25Z)
Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy. We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z)
Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. OMPN can be applied to partially observable environments and still achieve higher task decomposition performance. Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.