Related papers: What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models

What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models

URL: http://arxiv.org/abs/2503.09894v1
Date: Wed, 12 Mar 2025 23:24:40 GMT
Title: What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models
Authors: Abhipsha Das, Nicholas Lourie, Siavash Golkar, Mariel Pettee,
Abstract summary: Large language models (LLMs) fail to capture detailed relationships across large bodies of work.<n>Structured representations offer a natural complement -- enabling systematic analysis across the whole corpus.<n>We prototype a system that answers precise questions about the literature as a whole.
Score: 4.8261605642238745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scientific literature's exponential growth makes it increasingly challenging to navigate and synthesize knowledge across disciplines. Large language models (LLMs) are powerful tools for understanding scientific text, but they fail to capture detailed relationships across large bodies of work. Unstructured approaches, like retrieval augmented generation, can sift through such corpora to recall relevant facts; however, when millions of facts influence the answer, unstructured approaches become cost prohibitive. Structured representations offer a natural complement -- enabling systematic analysis across the whole corpus. Recent work enhances LLMs with unstructured or semistructured representations of scientific concepts; to complement this, we try extracting structured representations using LLMs. By combining LLMs' semantic understanding with a schema of scientific concepts, we prototype a system that answers precise questions about the literature as a whole. Our schema applies across scientific fields and we extract concepts from it using only 20 manually annotated abstracts. To demonstrate the system, we extract concepts from 30,000 papers on arXiv spanning astrophysics, fluid dynamics, and evolutionary biology. The resulting database highlights emerging trends and, by visualizing the knowledge graph, offers new ways to explore the ever-growing landscape of scientific knowledge. Demo: abby101/surveyor-0 on HF Spaces. Code: https://github.com/chiral-carbon/kg-for-science.

Related papers

The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes [0.0]
We introduce the Discovery Engine, a framework to transform literature into a unified, computationally tractable representation of a scientific domain.<n>The Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
arXiv Detail & Related papers (2025-05-23T05:51:34Z)
Science Hierarchography: Hierarchical Organization of Science Literature [20.182213614072836]
We motivate SCIENCE HARCHOGRAPHY, the goal of organizing scientific literature into a high-quality hierarchical structure. We develop a range of algorithms to achieve the goals of SCIENCE HIERARCHOGRAPHY. Results show that this structured approach enhances interpretability, supports trend discovery, and offers an alternative pathway for exploring scientific literature beyond traditional search methods.
arXiv Detail & Related papers (2025-04-18T17:59:29Z)
CEAR: Automatic construction of a knowledge graph of chemical entities and roles from scientific literature [4.086092284014203]
We propose a methodology that involves augmenting existing annotated text corpora with knowledge from Chebi and fine-tuning a large model (LLM) to recognize chemical entities and their roles in scientific text. By combining ontological knowledge understanding capabilities of LLMs, we achieve high precision and recall rates in identifying both the chemical entities and roles in scientific literature.
arXiv Detail & Related papers (2024-07-31T15:56:06Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z)
SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [57.96527452844273]
We introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning. We curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs. To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i.e., ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath.
arXiv Detail & Related papers (2024-01-15T20:22:21Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
MechGPT, a language-based strategy for mechanics and materials modeling that connects knowledge across scales, disciplines and modalities [0.0]
We use a Large Language Model (LLM) to distill question-answer pairs from raw sources followed by fine-tuning. The resulting MechGPT LLM foundation model is used in a series of computational experiments to explore its capacity for knowledge retrieval, various language tasks, hypothesis generation, and connecting knowledge across disparate areas.
arXiv Detail & Related papers (2023-10-16T14:29:35Z)
Large Language Models for Scientific Synthesis, Inference and Explanation [56.41963802804953]
We show how large language models can perform scientific synthesis, inference, and explanation. We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature. This approach has the further advantage that the large language model can explain the machine learning system's predictions.
arXiv Detail & Related papers (2023-10-12T02:17:59Z)
Structured information extraction from complex scientific text with fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction. The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts. This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.