Related papers: FOL-Pretrain: A complexity annotated corpus of first-order logic

FOL-Pretrain: A complexity annotated corpus of first-order logic

URL: http://arxiv.org/abs/2505.14932v1
Date: Tue, 20 May 2025 21:38:28 GMT
Title: FOL-Pretrain: A complexity annotated corpus of first-order logic
Authors: Isabelle Lee, Sarah Liaw, Dani Yogatama,
Abstract summary: Transformer-based large language models (LLMs) have demonstrated remarkable reasoning capabilities.<n>Despite recent efforts to reverse-engineer LLM behavior, our understanding of how these models internalize and execute complex algorithms remains limited.<n>We introduce a large-scale, fully open, complexity-annotated dataset of first-order logic reasoning traces.
Score: 16.061040115094592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based large language models (LLMs) have demonstrated remarkable reasoning capabilities such as coding and solving mathematical problems to commonsense inference. While these tasks vary in complexity, they all require models to integrate and compute over structured information. Despite recent efforts to reverse-engineer LLM behavior through controlled experiments, our understanding of how these models internalize and execute complex algorithms remains limited. Progress has largely been confined to small-scale studies or shallow tasks such as basic arithmetic and grammatical pattern matching. One barrier to deeper understanding is the nature of pretraining data -- vast, heterogeneous, and often poorly annotated, making it difficult to isolate mechanisms of reasoning. To bridge this gap, we introduce a large-scale, fully open, complexity-annotated dataset of first-order logic reasoning traces, designed to probe and analyze algorithmic reasoning in LLMs. The dataset consists of 3.5 billion tokens, including 8.8 million LLM-augmented, human-annotated examples and 7.5 million synthetically generated examples. Each synthetic example is verifiably correct, produced by a custom automated theorem solver, and accompanied by metadata tracing its algorithmic provenance. We aim to provide a scalable, interpretable artifact for studying how LLMs learn and generalize symbolic reasoning processes, paving the way for more transparent and targeted investigations into the algorithmic capabilities of modern models.

Related papers

Step-Level Sparse Autoencoder for Reasoning Process Interpretation [48.99201531966593]
Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning.<n>We propose step-level sparse autoencoder (SSAE), which serves as an analytical tool to disentangle different aspects of LLMs' reasoning steps into sparse features.<n> Experiments on multiple base models and reasoning tasks show the effectiveness of the extracted features.
arXiv Detail & Related papers (2026-03-03T14:25:02Z)
Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language Models [4.841487377596519]
This paper investigates whether Large Language Models' code-comprehension performance aligns with traditional human-centric software metrics.<n>We introduce a diagnostic framework that reframes code understanding as a binary input-output consistency task.
arXiv Detail & Related papers (2026-01-19T10:58:24Z)
Large Language Model for OWL Proofs [9.414596911029243]
Large Language Models (LLMs) can perform reasoning tasks such as deduction tasks.<n>We study proof generation in the context of construction, which are widely adopted for representing andreadable reasoning over complex knowledge.
arXiv Detail & Related papers (2026-01-18T14:57:57Z)
Do LLMs Dream of Discrete Algorithms? [0.7646713951724011]
Large Language Models (LLMs) have rapidly transformed the landscape of artificial intelligence.<n>Their reliance on probabilistic inference limits their effectiveness in domains requiring strict logical reasoning.<n>This paper proposes a neurosymbolic approach that augments LLMs with logic-based reasoning modules.
arXiv Detail & Related papers (2025-06-29T22:03:01Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
Code Simulation as a Proxy for High-order Tasks in Large Language Models [6.71786454125056]
We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM)<n>We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks.<n>Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
arXiv Detail & Related papers (2025-02-05T19:30:28Z)
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning [92.76959707441954]
We introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance.<n>ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity.<n>Our results reveal a significant decline in accuracy as problem complexity grows.
arXiv Detail & Related papers (2025-02-03T06:44:49Z)
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures [2.8311048083168657]
Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting.<n>We propose that LLMs learn arithmetic by capturing algebraic structures, such as commutativity and identity properties.
arXiv Detail & Related papers (2024-11-25T10:23:11Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z)
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.