Related papers: The Idola Tribus of AI: Large Language Models tend to perceive order where none exists

The Idola Tribus of AI: Large Language Models tend to perceive order where none exists

URL: http://arxiv.org/abs/2510.09709v1
Date: Fri, 10 Oct 2025 02:51:15 GMT
Title: The Idola Tribus of AI: Large Language Models tend to perceive order where none exists
Authors: Shin-nosuke Ishikawa, Masato Todo, Taiki Ogihara, Hirotsugu Ohba,
Abstract summary: We present a tendency of large language models (LLMs) to generate absurd patterns despite their clear inappropriateness.<n>This tendency can be interpreted as the AI model equivalent of Idola Tribus.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a tendency of large language models (LLMs) to generate absurd patterns despite their clear inappropriateness in a simple task of identifying regularities in number series. Several approaches have been proposed to apply LLMs to complex real-world tasks, such as providing knowledge through retrieval-augmented generation and executing multi-step tasks using AI agent frameworks. However, these approaches rely on the logical consistency and self-coherence of LLMs, making it crucial to evaluate these aspects and consider potential countermeasures. To identify cases where LLMs fail to maintain logical consistency, we conducted an experiment in which LLMs were asked to explain the patterns in various integer sequences, ranging from arithmetic sequences to randomly generated integer series. While the models successfully identified correct patterns in arithmetic and geometric sequences, they frequently over-recognized patterns that were inconsistent with the given numbers when analyzing randomly generated series. This issue was observed even in multi-step reasoning models, including OpenAI o3, o4-mini, and Google Gemini 2.5 Flash Preview Thinking. This tendency to perceive non-existent patterns can be interpreted as the AI model equivalent of Idola Tribus and highlights potential limitations in their capability for applied tasks requiring logical reasoning, even when employing chain-of-thought reasoning mechanisms.

Related papers

Sequential Enumeration in Large Language Models [0.3823356975862005]
This paper investigates the sequential enumeration abilities of five state-of-the-art Large Language Models (LLMs)<n>We find that some LLMs are indeed capable of deploying counting procedures when explicitly prompted to do so, but none of them spontaneously engage in counting when simply asked to enumerate the number of items in a sequence.
arXiv Detail & Related papers (2025-12-04T12:10:24Z)
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models [44.17697803306198]
We introduce textitCodeSeq, a synthetic post-training dataset built from number sequences.<n>Our pipeline generates supervised fine data by reflecting on failed test cases and incorporating iterative corrections.<n> Experimental results show that the models trained with textitCodeSeq improve on various reasoning tasks and can preserve the models' OOD performance.
arXiv Detail & Related papers (2025-10-16T12:29:40Z)
seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs [1.0519693622157462]
We introduce seqBench, a benchmark for probing sequential reasoning limits in Large Language Models (LLMs)<n>We find that even top-performing models systematically fail on seqBench's structured reasoning tasks despite minimal search complexity.
arXiv Detail & Related papers (2025-09-21T01:32:13Z)
Frontier LLMs Still Struggle with Simple Reasoning Tasks [53.497499123166804]
This work studies the performance of frontier language models on a broad set of "easy" reasoning problems.<n>We create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning.<n>We show that even state-of-the-art thinking models consistently fail on such problems and for similar reasons.
arXiv Detail & Related papers (2025-07-09T22:22:49Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
FOL-Pretrain: A complexity annotated corpus of first-order logic [16.061040115094592]
Transformer-based large language models (LLMs) have demonstrated remarkable reasoning capabilities.<n>Despite recent efforts to reverse-engineer LLM behavior, our understanding of how these models internalize and execute complex algorithms remains limited.<n>We introduce a large-scale, fully open, complexity-annotated dataset of first-order logic reasoning traces.
arXiv Detail & Related papers (2025-05-20T21:38:28Z)
Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models (LMs)<n>DisCIPL generates a task-specific inference program that is executed by a population of Follower models.<n>Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability [18.54202114336492]
We investigate the approximation, learnability, and generalization properties of multi-round auto-regressive models.<n>We show that Transformers with finite context windows are universal approximators for steps of Turing-computable functions.<n>We extend PAC learning to sequence generation and demonstrate that multi-round generation is learnable even when the sequence length exceeds the model's context window.
arXiv Detail & Related papers (2025-03-05T02:50:55Z)
Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning [9.75748930802634]
We study the power of in-context-learning to improve machine learning performance.<n>We consider an extremely simple domain: next token prediction on simple language tasks.<n>We find that LLMs uniformly underperform n-gram models on this task.
arXiv Detail & Related papers (2025-01-06T07:57:51Z)
Benchmarking Large Language Models with Integer Sequence Generation Tasks [2.204499020600093]
We present a benchmark designed to rigorously evaluate the capabilities of large language models (LLMs) in mathematical reasoning tasks.<n>The benchmark comprises integer sequence generation tasks sourced from the Online Encyclopedia of Sequences (OEIS)<n>Our evaluation includes leading models from OpenAI (including the specialized reasoning-focused o-series), Anthropic, Meta, and Google.
arXiv Detail & Related papers (2024-11-07T02:05:43Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Large Language Models as General Pattern Machines [64.75501424160748]
We show that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics.
arXiv Detail & Related papers (2023-07-10T17:32:13Z)
Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.