Related papers: Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science

Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science

URL: http://arxiv.org/abs/2508.02789v1
Date: Mon, 04 Aug 2025 18:01:35 GMT
Title: Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science
Authors: Newman Cheng, Gordon Broadbent, William Chappell,
Abstract summary: We introduce an alternative approach that enables deep and precise control over the reasoning process called: a cognitive loop via in-situ optimization (Clio)<n>Clio enables large language models to self-formulate ways of approaching a problem, adapt behavior when self-confidence is low, and ultimately provide scientists with a final belief or answer.<n>Without any further post-training, OpenAI's GPT-4.1 with CLIO yields an accuracy of 22.37% in text-based biology and medicine questions on Humanity's Last Exam (HLE)
Score: 1.309289689673624
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The capacity for artificial intelligence (AI) to formulate, evolve, and test altered thought patterns under dynamic conditions indicates advanced cognition that is crucial for scientific discovery. The existing AI development landscape falls into two categories: 1) frameworks over non-reasoning models that natively incorporate opinions on how humans think, and 2) reasoning models that abstract precise control of the reasoning intuition away from end users. While powerful, for scientists to maximize utility of AI in scientific discovery, they not only require accuracy and transparency in reasoning, but also steerability. Hence, we introduce an alternative approach that enables deep and precise control over the reasoning process called: a cognitive loop via in-situ optimization (CLIO). CLIO enables large language models (LLMs) to self-formulate ways of approaching a problem, adapt behavior when self-confidence is low, and ultimately provide scientists with a final belief or answer. Through CLIO's open design, scientists can observe uncertainty levels, understand how final belief states are formulated using graph structures, and interject corrections. Without any further post-training, OpenAI's GPT-4.1 with CLIO yields an accuracy of 22.37\% in text-based biology and medicine questions on Humanity's Last Exam (HLE). This yields a 13.82\% net or 161.64\% relative increase when compared to the base GPT-4.1 model and surpasses OpenAI's o3 performance in high and low reasoning effort modes. We further discovered that oscillations within internal uncertainty measures are key in determining the accuracy of CLIO's results, revealing how its open design and internal mechanisms can provide insight and control into scientific decision-making processes.

Related papers

ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I [0.0]
This study utilizes the Earth Science I section of the 2025 Korean College Scholastic Ability Test (CSAT) to analyze the multimodal scientific reasoning capabilities and cognitive limitations of state-of-the-art Large Language Models (LLMs)<n> Quantitative results indicated that unstructured inputs led to significant performance degradation due to segmentation and Optical Character Recognition (OCR) failures.<n>By exploiting AI's weaknesses, educators can distinguish genuine student competency from AI-generated responses, thereby ensuring assessment fairness.
arXiv Detail & Related papers (2025-12-17T10:46:41Z)
Cognitive Foundations for Reasoning and Their Manifestation in LLMs [63.12951576410617]
Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning.<n>We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, representations for organizing reasoning & knowledge, and transformation operations.<n>We develop test-time reasoning guidance that automatically scaffold successful structures, improving performance by up to 66.7% on complex problems.
arXiv Detail & Related papers (2025-11-20T18:59:00Z)
More Than Irrational: Modeling Belief-Biased Agents [25.274115351731325]
We introduce a class of computational-rational (CR) user models for cognitively-bounded agents acting optimally under biased beliefs.<n>We address the challenge of identifying the latent user-specific bound and inferring biased belief states from passive observations.<n>We show that our CR model generates intuitively plausible behaviors corresponding to different levels of memory capacity.
arXiv Detail & Related papers (2025-11-15T21:14:37Z)
A Definition of AGI [208.25193480759026]
The lack of a concrete definition for Artificial General Intelligence obscures the gap between today's specialized AI and human-level cognition.<n>This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult.
arXiv Detail & Related papers (2025-10-21T01:28:35Z)
The Need for Verification in AI-Driven Scientific Discovery [9.887965168376311]
Machine learning and large language models can generate hypotheses at a scale and speed far exceeding traditional methods.<n>We argue that without scalable and reliable mechanisms for verification, scientific progress risks being hindered rather than being advanced.
arXiv Detail & Related papers (2025-09-01T11:50:04Z)
The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z)
Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI [98.19195693735487]
We propose the paradigm of Intelligent Science Laboratories (ISLs)<n>ISLs are a multi-layered, closed-loop framework that deeply integrates cognitive and embodied intelligence.<n>We argue that such systems are essential for overcoming the current limitations of scientific discovery.
arXiv Detail & Related papers (2025-06-24T13:31:44Z)
PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration [9.640689981816852]
We introduce textttPiFlow, an information-theoretical framework for automated scientific discovery.<n>Our method significantly improves discovery efficiency, reflected by a 73.55% increase in the Area Under the Curve.<n>Overall, textttPiFlow serves as a Plug-and-Play method, establishing a novel paradigm shift in highly efficient automated scientific discovery.
arXiv Detail & Related papers (2025-05-21T03:09:39Z)
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training [86.70255651945602]
We introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE)<n>RICE aims to improve reasoning performance without additional training or complexs.<n> Empirical evaluations with leading MoE-based LRMs demonstrate noticeable and consistent improvements in reasoning accuracy, cognitive efficiency, and cross-domain generalization.
arXiv Detail & Related papers (2025-05-20T17:59:16Z)
Large language models as uncertainty-calibrated optimizers for experimental discovery [4.968931211284832]
We show that training language models through the uncertainty-aware objectives of traditional optimization methods enables their use as reliable overconfidence guided by natural language interfaces.<n>Our method nearly doubles the discovery rate of high-yielding reaction conditions, from 24% to 43% in 50 experimental starting from 10 unsuccessful conditions.
arXiv Detail & Related papers (2025-04-08T17:59:57Z)
Bridging Social Psychology and LLM Reasoning: Conflict-Aware Meta-Review Generation via Cognitive Alignment [35.82355113500509]
Large language models (LLMs) show promise in automating manuscript critiques.<n>Existing methods fail to handle conflicting viewpoints within differing opinions.<n>We propose the Cognitive Alignment Framework (CAF), a dual-process architecture that transforms LLMs into adaptive scientific arbitrators.
arXiv Detail & Related papers (2025-03-18T04:13:11Z)
Learning to Generate and Evaluate Fact-checking Explanations with Transformers [10.970249299147866]
Research contributes to the field of Explainable Artificial Antelligence (XAI) We develop transformer-based fact-checking models that contextualise and justify their decisions by generating human-accessible explanations. We emphasise the need for aligning Artificial Intelligence (AI)-generated explanations with human judgements.
arXiv Detail & Related papers (2024-10-21T06:22:51Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance [0.0]
This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF exam. Using a mixed method approach, we assessed the metacognitive performance of human participants and five advanced LLMs. The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans.
arXiv Detail & Related papers (2024-05-07T22:15:12Z)
Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias [57.42417061979399]
Recent studies show that instruction tuning (IT) and reinforcement learning from human feedback (RLHF) improve the abilities of large language models (LMs) dramatically. In this work, we investigate the effect of IT and RLHF on decision making and reasoning in LMs. Our findings highlight the presence of these biases in various models from the GPT-3, Mistral, and T5 families.
arXiv Detail & Related papers (2023-08-01T01:39:25Z)
The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence [67.70415658080121]
Recent advances in machine learning and AI are disrupting technological innovation, product development, and society as a whole. AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery.
arXiv Detail & Related papers (2023-07-09T21:16:56Z)
Thinking Fast and Slow in Large Language Models [0.08057006406834465]
Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. In this study, we show that LLMs like GPT-3 exhibit behavior that resembles human-like intuition - and the cognitive errors that come with it.
arXiv Detail & Related papers (2022-12-10T05:07:30Z)
Deceptive AI Explanations: Creation and Detection [3.197020142231916]
We investigate how AI models can be used to create and detect deceptive explanations. As an empirical evaluation, we focus on text classification and alter the explanations generated by GradCAM. We evaluate the effect of deceptive explanations on users in an experiment with 200 participants.
arXiv Detail & Related papers (2020-01-21T16:41:22Z)
Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI. We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.