Related papers: LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

URL: http://arxiv.org/abs/2601.06431v2
Date: Wed, 14 Jan 2026 02:51:23 GMT
Title: LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Authors: Qingyu Ren, Qianyu He, Jingwen Chang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Han Xia, Zeye Sun, Fei Yu,
Abstract summary: We propose a logic-structured training framework LSRIF that explicitly models instruction logic.<n> Experiments show LSRIF brings significant improvements in instruction-following and general reasoning.
Score: 56.517329105764475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Instruction-following is critical for large language models, but real-world instructions often contain logical structures such as sequential dependencies and conditional branching. Existing methods typically construct datasets with parallel constraints and optimize average rewards, ignoring logical dependencies and yielding noisy signals. We propose a logic-structured training framework LSRIF that explicitly models instruction logic. We first construct a dataset LSRInstruct with constraint structures such as parallel, sequential, and conditional types, and then design structure-aware rewarding method LSRIF including average aggregation for parallel structures, failure-penalty propagation for sequential structures, and selective rewards for conditional branches. Experiments show LSRIF brings significant improvements in instruction-following (in-domain and out-of-domain) and general reasoning. Analysis reveals that learning with explicit logic structures brings parameter updates in attention layers and sharpens token-level attention to constraints and logical operators.

Related papers

Zero-Shot Instruction Following in RL via Structured LTL Representations [50.41415009303967]
We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training.<n>In this setting, linear temporal logic has recently been adopted as a powerful framework for specifying structured, temporally extended tasks.<n>While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal structure inherent in specifications.
arXiv Detail & Related papers (2026-02-15T23:22:50Z)
Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version) [49.462399222747024]
We propose a novel framework for the logical specification of non-Markovian rewards in Decision Processes (MDPs) with large state spaces.<n>Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT)<n>We introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity.
arXiv Detail & Related papers (2026-02-05T22:11:28Z)
Last Layer Logits to Logic: Empowering LLMs with Logic-Consistent Structured Knowledge Reasoning [55.55968342644846]
Large Language Models (LLMs) achieve excellent performance in natural language reasoning tasks through pre-training on vast unstructured text.<n>We propose the textitLogits-to-Logic framework, which incorporates logits strengthening and logits filtering as core modules to correct logical defects in LLM outputs.
arXiv Detail & Related papers (2025-11-11T07:08:27Z)
Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning [29.722512436773638]
We propose textscStructure-R1, a framework that transforms retrieved content into structured representations optimized for reasoning.<n>We show that textscStructure-R1 consistently achieves competitive performance with a 7B-scale backbone model.<n>Our theoretical analysis demonstrates how structured representations enhance reasoning by improving information density and contextual clarity.
arXiv Detail & Related papers (2025-10-16T23:19:28Z)
ReasoningFlow: Semantic Structure of Complex Reasoning Traces [9.328084104525834]
ReasoningFlow parses traces into directed acyclic graphs, enabling the characterization of distinct reasoning patterns as subgraph structures.<n>This human-interpretable representation offers promising applications in understanding, evaluating, and enhancing the reasoning processes of LRMs.
arXiv Detail & Related papers (2025-06-03T07:11:34Z)
TLINet: Differentiable Neural Network Temporal Logic Inference [10.36033062385604]
This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. In contrast to existing approaches, we introduce approximation methods for max operator designed specifically for temporal logic-based gradient techniques. Our framework not only learns the structure but also the parameters of STL formulas, allowing flexible combinations of operators and various logical structures.
arXiv Detail & Related papers (2024-05-03T16:38:14Z)
Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading Comprehension [80.99865844249106]
We propose a holistic graph network (HGN) which deals with context at both discourse level and word level, as the basis for logical reasoning. Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism.
arXiv Detail & Related papers (2023-06-21T07:34:27Z)
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
Discourse-Aware Graph Networks for Textual Logical Reasoning [142.0097357999134]
Passage-level logical relations represent entailment or contradiction between propositional units (e.g., a concluding sentence) We propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs) The networks first construct logic graphs leveraging in-line discourse connectives and generic logic theories, then learn logic representations by end-to-end evolving the logic relations with an edge-reasoning mechanism and updating the graph features.
arXiv Detail & Related papers (2022-07-04T14:38:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.