The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes
- URL: http://arxiv.org/abs/2505.17500v1
- Date: Fri, 23 May 2025 05:51:34 GMT
- Title: The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes
- Authors: Vladimir Baulin, Austin Cook, Daniel Friedman, Janna Lumiruusu, Andrew Pashea, Shagor Rahman, Benedikt Waldeck,
- Abstract summary: We introduce the Discovery Engine, a framework to transform literature into a unified, computationally tractable representation of a scientific domain.<n>The Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address these challenges by transforming an array of disconnected literature into a unified, computationally tractable representation of a scientific domain. Central to our approach is the LLM-driven distillation of publications into structured "knowledge artifacts," instances of a universal conceptual schema, complete with verifiable links to source evidence. These artifacts are then encoded into a high-dimensional Conceptual Tensor. This tensor serves as the primary, compressed representation of the synthesized field, where its labeled modes index scientific components (concepts, methods, parameters, relations) and its entries quantify their interdependencies. The Discovery Engine allows dynamic "unrolling" of this tensor into human-interpretable views, such as explicit knowledge graphs (the CNM graph) or semantic vector spaces, for targeted exploration. Crucially, AI agents operate directly on the graph using abstract mathematical and learned operations to navigate the knowledge landscape, identify non-obvious connections, pinpoint gaps, and assist researchers in generating novel knowledge artifacts (hypotheses, designs). By converting literature into a structured tensor and enabling agent-based interaction with this compact representation, the Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
Related papers
- Benchmarking the Discovery Engine [1.268004015017258]
The Discovery Engine is a general purpose automated system for scientific discovery.<n>It combines machine learning with state-of-the-art ML interpretability to enable rapid and robust scientific insight.
arXiv Detail & Related papers (2025-07-01T17:13:31Z) - Predicting New Research Directions in Materials Science using Large Language Models and Concept Graphs [30.813288388998256]
We show that large language models (LLMs) can extract concepts more efficiently than automated keyword extraction methods.<n>A machine learning model is trained to predict emerging combinations of concepts, based on historical data.<n>We show that the model can inspire materials scientists in their creative thinking process by predicting innovative combinations of topics that have not yet been investigated.
arXiv Detail & Related papers (2025-06-20T08:26:12Z) - SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge [0.4499833362998487]
We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model.<n>We develop a hybrid approach, called SciMantify, to support its evolving semantification.<n>We implement the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge.
arXiv Detail & Related papers (2025-04-14T07:57:55Z) - What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models [4.8261605642238745]
Large language models (LLMs) fail to capture detailed relationships across large bodies of work.<n>Structured representations offer a natural complement -- enabling systematic analysis across the whole corpus.<n>We prototype a system that answers precise questions about the literature as a whole.
arXiv Detail & Related papers (2025-03-12T23:24:40Z) - Neural-Symbolic Reasoning over Knowledge Graphs: A Survey from a Query Perspective [55.79507207292647]
Knowledge graph reasoning is pivotal in various domains such as data mining, artificial intelligence, the Web, and social sciences.<n>The rise of Neural AI marks a significant advancement, merging the robustness of deep learning with the precision of symbolic reasoning.<n>The advent of large language models (LLMs) has opened new frontiers in knowledge graph reasoning.
arXiv Detail & Related papers (2024-11-30T18:54:08Z) - SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning [0.0]
A key challenge in artificial intelligence is the creation of systems capable of autonomously advancing scientific understanding.
We present SciAgents, an approach that leverages three core concepts.
The framework autonomously generates and refines research hypotheses, elucidating underlying mechanisms, design principles, and unexpected material properties.
Our case studies demonstrate scalable capabilities to combine generative AI, ontological representations, and multi-agent modeling, harnessing a swarm of intelligence' similar to biological systems.
arXiv Detail & Related papers (2024-09-09T12:25:10Z) - Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning [0.0]
We have transformed a dataset comprising 1,000 scientific papers into an ontological knowledge graph.
We have calculated node degrees, identified communities and connectivities, and evaluated clustering coefficients and betweenness centrality of pivotal nodes.
The graph has an inherently scale-free nature, is highly connected, and can be used for graph reasoning.
arXiv Detail & Related papers (2024-03-18T17:30:27Z) - AceMap: Knowledge Discovery through Academic Graph [90.12694363549483]
AceMap is an academic system designed for knowledge discovery through academic graph.
We present advanced database construction techniques to build the comprehensive AceMap database.
AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas.
arXiv Detail & Related papers (2024-03-05T01:17:56Z) - Large Language Models for Scientific Synthesis, Inference and
Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation.
We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature.
This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z) - State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model.
We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing.
We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.