Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
- URL: http://arxiv.org/abs/2508.17681v3
- Date: Tue, 23 Sep 2025 15:40:35 GMT
- Title: Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
- Authors: Robert Yang,
- Abstract summary: Do large language models (LLMs) truly generate new knowledge, or do they merely remix memorized fragments?<n>We propose unlearning-as-ablation as a falsifiable probe of constructive scientific discovery.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bold claims about AI's role in science-from "AGI will cure all diseases" to promises of radically accelerated discovery-raise a central epistemic question: do large language models (LLMs) truly generate new knowledge, or do they merely remix memorized fragments? We propose unlearning-as-ablation as a falsifiable probe of constructive scientific discovery. The idea is to systematically remove a target result together with its forget-closure (supporting lemmas, paraphrases, and multi-hop entailments) and then evaluate whether the model can re-derive the result from only permitted axioms and tools. Success would indicate generative capability beyond recall; failure would expose current limits. Unlike prevailing motivations for unlearning-privacy, copyright, or safety-our framing repositions it as an epistemic probe for AI-for-Science. We outline a minimal pilot in mathematics and algorithms to illustrate feasibility, and sketch how the same approach could later be extended to domains such as physics or chemistry. This is a position paper: our contribution is conceptual and methodological, not empirical. We aim to stimulate discussion on how principled ablation tests could help distinguish models that reconstruct knowledge from those that merely retrieve it, and how such probes might guide the next generation of AI-for-Science benchmarks.
Related papers
- Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms [53.907293349123506]
Large language models often fail at producing ideas that are both coherent and non-obvious to the current community.<n>We formalize this gap through cognitive availability, the likelihood that a research direction would be naturally proposed by a typical researcher.<n>We learn two complementary models: a coherence model that scores whether a set of atoms constitutes a viable direction, and an availability model that scores how likely that direction is to be generated.
arXiv Detail & Related papers (2026-03-01T13:05:19Z) - Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z) - Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs [78.18336140706471]
Sci-LLMs have emerged as a promising frontier for accelerating biological discovery.<n>Current strategies limit Sci-LLMs' reasoning capacity when processing raw biomolecular sequences.<n>We show that a more effective strategy is to provide Sci-LLMs with high-level structured context.
arXiv Detail & Related papers (2025-10-27T09:03:21Z) - NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents [65.85967483058705]
Large language models are emerging as powerful tools for scientific law discovery.<n>Existing benchmarks for this task suffer from a fundamental methodological trilemma.<n>We introduce NewtonBench, a benchmark comprising 324 scientific law discovery tasks across 12 physics domains.
arXiv Detail & Related papers (2025-10-08T16:12:11Z) - Newton to Einstein: Axiom-Based Discovery via Game Design [55.30047000068118]
We propose a game design framework in which scientific inquiry is recast as a rule-evolving system.<n>Unlike conventional ML approaches that operate within fixed assumptions, our method enables the discovery of new theoretical structures.
arXiv Detail & Related papers (2025-09-05T18:59:18Z) - The Need for Verification in AI-Driven Scientific Discovery [9.887965168376311]
Machine learning and large language models can generate hypotheses at a scale and speed far exceeding traditional methods.<n>We argue that without scalable and reliable mechanisms for verification, scientific progress risks being hindered rather than being advanced.
arXiv Detail & Related papers (2025-09-01T11:50:04Z) - Active Inference AI Systems for Scientific Discovery [1.450405446885067]
This perspective contends that progress turns on closing three mutually reinforcing gaps in abstraction, reasoning and empirical grounding.<n>Design principles are proposed for systems that reason in imaginary spaces and learn from the world.
arXiv Detail & Related papers (2025-06-26T14:43:04Z) - Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI [98.19195693735487]
We propose the paradigm of Intelligent Science Laboratories (ISLs)<n>ISLs are a multi-layered, closed-loop framework that deeply integrates cognitive and embodied intelligence.<n>We argue that such systems are essential for overcoming the current limitations of scientific discovery.
arXiv Detail & Related papers (2025-06-24T13:31:44Z) - Artificial Scientific Discovery [5.241773225218436]
This thesis spans from AlphaGo to ChatGPT to empirically examine the concepts needed to realize the vision of an artificial scientist.<n>An artificial scientist must develop its own interpretation of the language used to explain its findings, and not rely on a rigid existing interpreter.<n>This culminates in a simple idea to build CLIP-like models where interpretation and perception are explicitly disentangled.
arXiv Detail & Related papers (2024-11-18T15:51:45Z) - The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery [14.465756130099091]
This paper presents the first comprehensive framework for fully automatic scientific discovery.
We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, and describes its findings.
In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community.
arXiv Detail & Related papers (2024-08-12T16:58:11Z) - Towards a Science Exocortex [0.5687661359570725]
We review the state of the art in agentic AI systems, and discuss how these methods could be extended to have greater impact on science.
A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks.
arXiv Detail & Related papers (2024-06-24T14:32:32Z) - SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [57.96527452844273]
We introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning.
We curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs.
To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i.e., ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
We take a dramatic departure with a novel setting in which models use as input background contexts.
We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z) - I2D2: Inductive Knowledge Distillation with NeuroLogic and
Self-Imitation [89.38161262164586]
We study generative models of commonsense knowledge, focusing on the task of generating generics.
We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al.
Our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.
arXiv Detail & Related papers (2022-12-19T04:47:49Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - AI Research Associate for Early-Stage Scientific Discovery [1.6861004263551447]
Artificial intelligence (AI) has been increasingly applied in scientific activities for decades.
We present an AI research associate for early-stage scientific discovery based on a novel minimally-biased physics-based modeling.
arXiv Detail & Related papers (2022-02-02T17:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.