Related papers: Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

URL: http://arxiv.org/abs/2601.09753v1
Date: Tue, 13 Jan 2026 02:02:27 GMT
Title: Critically Engaged Pragmatism: A Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools
Authors: Carole J. Lee,
Abstract summary: I caution that AI science evaluation tools are particularly prone to false ascent due to contestation about the purposes to which they should be put.<n>I argue for a social, pragmatist and a newly articulated norm of Critically Engaged to enjoin scientific communities to vigorously scrutinize the purposes and purpose-specific reliability of AI science evaluation tools.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Crises in peer review capacity, study replication, and AI-fabricated science have intensified interest in automated tools for assessing scientific research. However, the scientific community has a history of decontextualizing and repurposing credibility markers in inapt ways. I caution that AI science evaluation tools are particularly prone to these kinds of inference by false ascent due to contestation about the purposes to which they should be put, their portability across purposes, and technical demands that prioritize data set size over epistemic fit. To counter this, I argue for a social, pragmatist epistemology and a newly articulated norm of Critically Engaged Pragmatism to enjoin scientific communities to vigorously scrutinize the purposes and purpose-specific reliability of AI science evaluation tools. Under this framework, AI science evaluation tools are not objective arbiters of scientific credibility, but the object of the kinds of critical discursive practices that ground the credibility of scientific communities.

Related papers

BABE: Biology Arena BEnchmark [51.53220868983288]
BABE is a benchmark designed to evaluate the experimental reasoning capabilities of biological AI systems.<n>Our benchmark provides a robust framework for assessing how well AI systems can reason like practicing scientists.
arXiv Detail & Related papers (2026-02-05T16:39:20Z)
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence [60.202862987441684]
We introduce scientific instruction following: the capability to solve problems while strictly adhering to the constraints that establish scientific validity.<n>Specifically, we introduce SciIF, a multi-discipline benchmark that evaluates this capability by pairing university-level problems with a fixed catalog of constraints.<n>By measuring both solution correctness and multi-constraint adherence, SciIF enables finegrained diagnosis of compositional reasoning failures.
arXiv Detail & Related papers (2026-01-08T09:45:58Z)
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence [99.30934038146965]
SciEvalKit focuses on the core competencies of scientific intelligence.<n>It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science.<n>The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.
arXiv Detail & Related papers (2025-12-26T17:36:02Z)
Is Research Software Science a Metascience? [0.0]
We define metascience and RSS, compare their principles and objectives, and examine their overlaps.<n>We argue RSS is best understood as a distinct interdisciplinary domain that aligns with metascience.<n>Regardless of classification, applying scientific rigor to research software ensures the tools of discovery meet the standards of the discoveries themselves.
arXiv Detail & Related papers (2025-09-16T18:13:52Z)
Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning [53.82037883518254]
We introduce SciReas, a diverse suite of existing benchmarks for scientific reasoning tasks.<n>We then propose KRUX, a probing framework for studying the distinct roles of reasoning and knowledge in scientific tasks.
arXiv Detail & Related papers (2025-08-26T17:04:23Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
AI Scientists Fail Without Strong Implementation Capability [33.232300349142285]
The emergence of Artificial Intelligence (AI) Scientist represents a paradigm shift in scientific discovery.<n>Recent AI Scientist studies demonstrate sufficient capabilities for independent scientific discovery.<n>Despite this substantial progress, AI Scientist has yet to produce a groundbreaking achievement in the domain of computer science.
arXiv Detail & Related papers (2025-06-02T06:59:10Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently.<n>Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
A Critical Examination of the Ethics of AI-Mediated Peer Review [0.0]
Recent advancements in artificial intelligence (AI) systems offer promise and peril for scholarly peer review. Human peer review systems are also fraught with related problems, such as biases, abuses, and a lack of transparency. The legitimacy of AI-driven peer review hinges on the alignment with the scientific ethos.
arXiv Detail & Related papers (2023-09-02T18:14:10Z)
Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond [3.3504365823045044]
This paper reviews challenges, ethical and integrity risks in science conduct in the advent of generative AI. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers.
arXiv Detail & Related papers (2023-05-24T16:23:46Z)
The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? [0.0]
There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects.
arXiv Detail & Related papers (2019-12-27T10:20:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.