Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms
- URL: http://arxiv.org/abs/2603.01092v1
- Date: Sun, 01 Mar 2026 13:05:19 GMT
- Title: Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms
- Authors: Alejandro H. Artiles, Martin Weiss, Levin Brinkmann, Anirudh Goyal, Nasim Rahaman,
- Abstract summary: Large language models often fail at producing ideas that are both coherent and non-obvious to the current community.<n>We formalize this gap through cognitive availability, the likelihood that a research direction would be naturally proposed by a typical researcher.<n>We learn two complementary models: a coherence model that scores whether a set of atoms constitutes a viable direction, and an availability model that scores how likely that direction is to be generated.
- Score: 53.907293349123506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models are adept at synthesizing and recombining familiar material, yet they often fail at a specific kind of creativity that matters most in research: producing ideas that are both coherent and non-obvious to the current community. We formalize this gap through cognitive availability, the likelihood that a research direction would be naturally proposed by a typical researcher given what they have worked on. We introduce a pipeline that (i) decomposes papers into granular conceptual units, (ii) clusters recurring units into a shared vocabulary of idea atoms, and (iii) learns two complementary models: a coherence model that scores whether a set of atoms constitutes a viable direction, and an availability model that scores how likely that direction is to be generated by researchers drawn from the community. We then sample "alien" directions that score high on coherence but low on availability. On a corpus of $\sim$7,500 recent LLM papers from NeurIPS, ICLR and ICML, we validate that (a) conceptual units preserve paper content under reconstruction, (b) idea atoms generalize across papers rather than memorizing paper-specific phrasing, and (c) the Alien sampler produces research directions that are more diverse than LLM baselines while maintaining coherence.
Related papers
- Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation [1.2232326171442904]
This paper proposes a scientific idea generation system called GYWI.<n>It combines author knowledge graphs with retrieval-augmented generation (RAG) to form an external knowledge base.<n>The generated ideas are evaluated from the following five dimensions: novelty, feasibility, clarity, relevance, and significance.
arXiv Detail & Related papers (2025-12-05T03:38:23Z) - The Ramon Llull's Thinking Machine for Automated Ideation [52.06378166909451]
Our approach defines three compositional axes: Theme, Domain, and Method.<n>We show that prompting LLMs with curated combinations produces research ideas that are diverse, relevant, and grounded in current literature.<n>This modern thinking machine offers a lightweight, interpretable tool for augmenting scientific creativity and suggests a path toward collaborative ideation between humans and AI.
arXiv Detail & Related papers (2025-08-26T17:03:43Z) - Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery [0.0]
Do large language models (LLMs) truly generate new knowledge, or do they merely remix memorized fragments?<n>We propose unlearning-as-ablation as a falsifiable probe of constructive scientific discovery.
arXiv Detail & Related papers (2025-08-25T05:24:15Z) - The Budget AI Researcher and the Power of RAG Chains [4.797627592793464]
Current approaches to supporting research idea generation often rely on generic large language models (LLMs)<n>Our framework, The Budget AI Researcher, uses retrieval-augmented generation chains, vector databases, and topic-guided pairing to recombine concepts from hundreds of machine learning papers.<n>The system ingests papers from nine major AI conferences, which collectively span the vast subfields of machine learning, and organizes them into a hierarchical topic tree.
arXiv Detail & Related papers (2025-06-14T02:40:35Z) - Harnessing Large Language Models for Scientific Novelty Detection [49.10608128661251]
We propose to harness large language models (LLMs) for scientific novelty detection (ND)<n>To capture idea conception, we propose to train a lightweight retriever by distilling the idea-level knowledge from LLMs.<n> Experiments show our method consistently outperforms others on the proposed benchmark datasets for idea retrieval and ND tasks.
arXiv Detail & Related papers (2025-05-30T14:08:13Z) - CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation [7.086262532457526]
CHIMERA is a large-scale Knowledge Base of over 28K recombination examples.<n>ChiMERA enables empirical analysis of how scientists recombine concepts and draw inspiration from different areas.<n>We showcase the utility of CHIMERA through two applications.
arXiv Detail & Related papers (2025-05-27T06:36:04Z) - Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field [0.0]
This paper offers an analysis of the ability of large models to identify semantic relationships between different research topics.<n>We developed a gold standard based on the IEEE Thesaurus to evaluate the task.<n>Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral, and Claude 3-7B.
arXiv Detail & Related papers (2024-12-11T10:11:41Z) - Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents [64.64280477958283]
An exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions.
Recent developments in large language models(LLMs) suggest a promising avenue for automating the generation of novel research ideas.
We propose a Chain-of-Ideas(CoI) agent, an LLM-based agent that organizes relevant literature in a chain structure to effectively mirror the progressive development in a research domain.
arXiv Detail & Related papers (2024-10-17T03:26:37Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.