Related papers: Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning

Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning

URL: http://arxiv.org/abs/2506.10585v1
Date: Thu, 12 Jun 2025 11:21:58 GMT
Title: Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning
Authors: Mohd Anwar Jamal Faiz,
Abstract summary: Primender sequence is a novel integer sequence that combines classical primality with modular digit-based conditions.<n>We propose the sequence as a benchmark for evaluating the symbolic reasoning capabilities of Large Language Models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces the Primender sequence, a novel integer sequence defined by a hybrid rule that combines classical primality with modular digit-based conditions. Specifically, a number n is included in the sequence if it is prime or ends with a prime number of unit digit or any length. In other words, numbers which are primes or have at least one prime suffix. The resulting sequence exhibits a deterministic yet non-trivial structure, blending number-theoretic properties with symbolic patterning. We propose the Primender sequence as a benchmark for evaluating the symbolic reasoning capabilities of Large Language Models (LLMs). The study is motivated by the need for interpretable, rule-based testbeds that can assess an LLM's ability to infer hidden rules, validate mathematical hypotheses, and generalize symbolic logic at scale. A key hypothesis explored is: Whenever a number in the Primender sequence is exactly one more than the largest prime less than or equal to it, the difference between it and the previous number in the sequence is also 1. We design a structured prompt and evaluation framework to test this hypothesis across multiple state-of-the-art LLMs, including ChatGPT, Copilot, DeepSeek, Gemini, Grok, and LLaMA. The models are tasked with identifying the underlying rule, validating the hypothesis, and generating the next 100,000 terms of the sequence. Comparative metrics such as rule inference accuracy, hypothesis evaluation, sequence validity, and symbolic explanation quality are used to assess model performance. This work contributes a novel mathematical construct and a reproducible methodology for benchmarking LLMs in symbolic reasoning, hypothesis testing, and scalable pattern generalization - bridging the domains of number theory, artificial intelligence, and software engineering.

Related papers

Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version) [49.462399222747024]
We propose a novel framework for the logical specification of non-Markovian rewards in Decision Processes (MDPs) with large state spaces.<n>Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT)<n>We introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity.
arXiv Detail & Related papers (2026-02-05T22:11:28Z)
Testing Transformer Learnability on the Arithmetic Sequence of Rooted Trees [41.17969667763904]
We study whether a Large Language Model can learn the deterministic sequence of trees generated by the iterated prime factorization of the natural numbers.<n>Our results show that the model partially learns the internal grammar of $mathbbNmathcalT$, capturing non-trivial regularities and correlations.
arXiv Detail & Related papers (2025-12-01T16:51:38Z)
Machine Learnability as a Measure of Order in Aperiodic Sequences [0.07026564887314536]
We show that it is possible to use an image-focused machine learning model to measure the regularity of prime number fields at specific regions of an Ulam spiral.<n>We demonstrate that in pure accuracy terms, models trained on blocks extracted from regions of the spiral in the vicinity of 500m outperform models trained on blocks extracted from the region representing integers lower than 25m.
arXiv Detail & Related papers (2025-09-09T04:57:32Z)
Primality Testing via Circulant Matrix Eigenvalue Structure: A Novel Approach Using Cyclotomic Field Theory [2.0547410497538445]
This paper presents a novel primality test based on the eigenvalue structure of circulant matrices constructed from roots of unity.<n>We prove that integer $n > 2$ is prime if and only if a minimal validation of the matrix of $C_n = W_n + W_n2$ has exactly two irreducible factors.
arXiv Detail & Related papers (2025-04-28T17:46:57Z)
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences [38.76458756232632]
We study inductive reasoning in large language models.<n>We use number sequences as the source of inductive reasoning data.<n>We build a sequence synthetic data pipeline and form a training dataset CodeSeq.
arXiv Detail & Related papers (2025-03-17T12:33:26Z)
Prime Convolutional Model: Breaking the Ground for Theoretical Explainability [45.07003937279752]
We propose a new theoretical approach to Explainable AI.<n>We apply the method to a case study created in a controlled environment.<n>We show that the different behaviors of p-Conv can be modeled mathematically in terms of $m$ and $B$.
arXiv Detail & Related papers (2025-03-04T16:42:46Z)
Benchmarking Large Language Models with Integer Sequence Generation Tasks [1.3108652488669736]
This paper presents a novel benchmark where the large language model (LLM) must write code that computes integer sequences from the Online Encyclopedia of Sequences (OEIS) Our benchmark reveals that the o1 series of models outperform other frontier models from OpenAI, Anthropic, Meta, and Google in accuracy and cheating rates across both easy and hard integer sequences.
arXiv Detail & Related papers (2024-11-07T02:05:43Z)
Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models. We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z)
Premise Order Matters in Reasoning with Large Language Models [57.18850969634412]
We show that large language models (LLMs) are surprisingly brittle to the ordering of the premises. We observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps.
arXiv Detail & Related papers (2024-02-14T04:50:18Z)
Decidable Fragments of LTLf Modulo Theories (Extended Version) [66.25779635347122]
In general,fMT was shown to be semi-decidable for any decidable first-order theory (e.g., linear arithmetics) with a tableau-based semi-decision procedure. We show that for anyfMT formula satisfies an abstract, semantic condition, that we call finite memory, the tableau augmented with a new rule is also guaranteed to terminate.
arXiv Detail & Related papers (2023-07-31T17:02:23Z)
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z)
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought [10.524051272257614]
Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts. We present a new synthetic question-answering dataset called PrOntoQA, where each example is generated as a synthetic world model. This allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis.
arXiv Detail & Related papers (2022-10-03T21:34:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.