CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature
- URL: http://arxiv.org/abs/2505.20779v3
- Date: Thu, 05 Jun 2025 15:20:59 GMT
- Title: CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature
- Authors: Noy Sternlicht, Tom Hope,
- Abstract summary: We build CHIMERA: a large-scale knowledge base (KB) of recombination examples.<n>ChiMERA can be used to explore how scientists recombine concepts and take inspiration from different areas.<n>We analyze CHIMERA to explore the properties of recombination in different subareas of AI.
- Score: 7.086262532457526
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A hallmark of human innovation is the process of recombination -- creating original ideas by integrating elements of existing mechanisms and concepts. In this work, we automatically mine the scientific literature and build CHIMERA: a large-scale knowledge base (KB) of recombination examples. CHIMERA can be used to empirically explore at scale how scientists recombine concepts and take inspiration from different areas, or to train supervised machine learning models that learn to predict new creative cross-domain directions. To build this KB, we present a novel information extraction task of extracting recombination from scientific paper abstracts, collect a high-quality corpus of hundreds of manually annotated abstracts, and use it to train an LLM-based extraction model. The model is applied to a large corpus of papers in the AI domain, yielding a KB of over 28K recombination examples. We analyze CHIMERA to explore the properties of recombination in different subareas of AI. Finally, we train a scientific hypothesis generation model using the KB, which predicts new recombination directions that real-world researchers find inspiring. Our data and code are available at https://github.com/noy-sternlicht/CHIMERA-KB
Related papers
- Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs [30.813288388998256]
We show that large language models (LLMs) can extract concepts more efficiently than automated keyword extraction methods.<n>A machine learning model is trained to predict emerging combinations of concepts, based on historical data.<n>We show that the model can inspire materials scientists in their creative thinking process by predicting innovative combinations of topics that have not yet been investigated.
arXiv Detail & Related papers (2025-06-20T08:26:12Z) - The Budget AI Researcher and the Power of RAG Chains [4.797627592793464]
Current approaches to supporting research idea generation often rely on generic large language models (LLMs)<n>Our framework, The Budget AI Researcher, uses retrieval-augmented generation chains, vector databases, and topic-guided pairing to recombine concepts from hundreds of machine learning papers.<n>The system ingests papers from nine major AI conferences, which collectively span the vast subfields of machine learning, and organizes them into a hierarchical topic tree.
arXiv Detail & Related papers (2025-06-14T02:40:35Z) - Harnessing Large Language Models for Scientific Novelty Detection [49.10608128661251]
We propose to harness large language models (LLMs) for scientific novelty detection (ND)<n>To capture idea conception, we propose to train a lightweight retriever by distilling the idea-level knowledge from LLMs.<n> Experiments show our method consistently outperforms others on the proposed benchmark datasets for idea retrieval and ND tasks.
arXiv Detail & Related papers (2025-05-30T14:08:13Z) - The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes [0.0]
We introduce the Discovery Engine, a framework to transform literature into a unified, computationally tractable representation of a scientific domain.<n>The Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
arXiv Detail & Related papers (2025-05-23T05:51:34Z) - Structuring Scientific Innovation: A Framework for Modeling and Discovering Impactful Knowledge Combinations [8.652262434505955]
We propose a structured approach that emphasizes the role of method combinations in shaping disruptive insights.<n>We introduce a contrastive learning-based mechanism to identify distinguishing features of historically disruptive method combinations.<n>Second, we propose a reasoning-guided Monte Carlo search algorithm that leverages the chain-of-thought capability of LLMs to identify promising knowledge recombinations.
arXiv Detail & Related papers (2025-03-24T16:41:17Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on.
Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.
The all-in-one image restoration (AiOIR) paradigm has emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.<n>VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.<n>We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - Mitigating Copy Bias in In-Context Learning through Neuron Pruning [74.91243772654519]
Large language models (LLMs) have demonstrated impressive few-shot in-context learning abilities.
They are sometimes prone to a copying bias', where they copy answers from provided examples instead of learning the underlying patterns.
We propose a novel and simple method to mitigate such copying bias.
arXiv Detail & Related papers (2024-10-02T07:18:16Z) - Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination [23.48126633604684]
Scideator is a mixed-initiative ideation tool based on large language models.<n>It allows users to explore the idea space by interactively recombining facets from scientific papers.<n>It also helps users gauge idea originality by searching the literature for overlaps and assessing idea novelty.
arXiv Detail & Related papers (2024-09-23T00:09:34Z) - Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept.
We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Exploring and Verbalizing Academic Ideas by Concept Co-occurrence [42.16213986603552]
This study devises a framework based on concept co-occurrence for academic idea inspiration.
We construct evolving concept graphs according to the co-occurrence relationship of concepts from 20 disciplines or topics.
We generate a description of an idea based on a new data structure called co-occurrence citation quintuple.
arXiv Detail & Related papers (2023-06-04T07:01:30Z) - Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo)
Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z) - BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from
Pretrained Language Models [65.51390418485207]
We propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs.
With minimal input of a relation definition, the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge.
We deploy the approach to harvest KGs of over 400 new relations from different LMs.
arXiv Detail & Related papers (2022-06-28T19:46:29Z) - AIGenC: An AI generalisation model via creativity [1.933681537640272]
Inspired by cognitive theories of creativity, this paper introduces a computational model (AIGenC)
It lays down the necessary components to enable artificial agents to learn, use and generate transferable representations.
We discuss the model's capability to yield better out-of-distribution generalisation in artificial agents.
arXiv Detail & Related papers (2022-05-19T17:43:31Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.