Related papers: LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models

LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models

URL: http://arxiv.org/abs/2510.11031v1
Date: Mon, 13 Oct 2025 06:01:02 GMT
Title: LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
Authors: Yiwei Liu, Yucheng Li, Xiao Li, Gong Cheng,
Abstract summary: LogiNum Synth is a natural language problem synthesizer that synthesizes tasks requiring proficiency in joint logical reasoning.<n>It supports fine-grained control over reasoning world richness, logical reasoning depth, and the complexity of numerical computations.<n>It can serve as both a diagnostic tool and a source of targeted supervision for advancing integrated reasoning skills.
Score: 14.833385574931855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Joint logical-numerical reasoning remains a major challenge for language models, yet existing datasets rely on fixed rule sets and offer limited control over task complexity, constraining their generalizability for evaluation and training. We present LogiNumSynth, a flexible natural language problem synthesizer that synthesizes tasks requiring proficiency in joint logical reasoning (e.g., rule-based reasoning) and numerical reasoning (e.g., arithmetic computation). LogiNumSynth supports fine-grained control over reasoning world richness, logical reasoning depth, and the complexity of numerical computations, enabling flexible data synthesis across difficulty levels. We demonstrate three key contributions: (1) Synthesizer -- synthesizing fully controllable joint reasoning tasks over natural language; (2) Evaluation & Process Analysis -- evaluating both process accuracy and answer accuracy; (3) Targeted Training -- using synthesized data to enhance LLMs' reasoning performance. Experiments with multiple LLMs highlight persistent weaknesses in logical-numerical reasoning, showing that LogiNumSynth can serve as both a diagnostic tool and a source of targeted supervision for advancing integrated reasoning skills.

Related papers

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond [35.80475408913363]
We present SynLogic, a data synthesis framework and dataset that generates diverse logical reasoning data at scale.<n>We validate the effectiveness of RL training on the SynLogic dataset based on 7B and 32B models.<n>Our mixed training model outperforms DeepSeek-R1-Zero-Qwen-32B across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T07:59:36Z)
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach.<n>It generates graphs as solutions by combining Python-formatted functions from this library.<n>Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z)
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment [21.12989936864145]
Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Reasoning-as-Logic-Units (RaLU), which constructs a more reliable reasoning path by aligning logical units between the generated program and their corresponding NL descriptions.
arXiv Detail & Related papers (2025-02-05T08:23:18Z)
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning [92.76959707441954]
We introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance.<n>ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity.<n>Our results reveal a significant decline in accuracy as problem complexity grows.
arXiv Detail & Related papers (2025-02-03T06:44:49Z)
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data [53.433309883370974]
This work explores the potential and limitations of using graph-based synthetic reasoning data as training signals to enhance Large Language Models' reasoning capabilities.<n>Our experiments, conducted on two established natural language reasoning tasks, demonstrate that supervised fine-tuning with synthetic graph-based reasoning data effectively enhances LLMs' reasoning performance without compromising their effectiveness on other standard evaluation benchmarks.
arXiv Detail & Related papers (2024-09-19T03:39:09Z)
Reliable Reasoning Beyond Natural Language [0.047888359248129786]
Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. We propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements. We then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning.
arXiv Detail & Related papers (2024-07-16T04:34:18Z)
Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z)
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z)
Scallop: A Language for Neurosymbolic Programming [14.148819428748597]
Scallop is a language that combines the benefits of deep learning and logical reasoning. It is capable of expressing algorithmic reasoning in diverse and challenging AI tasks. It provides a succinct interface for machine learning programmers to integrate logical domain knowledge.
arXiv Detail & Related papers (2023-04-10T18:46:53Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.