Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer
of Scientific Concepts across Text Corpora
- URL: http://arxiv.org/abs/2010.06657v1
- Date: Tue, 13 Oct 2020 19:46:59 GMT
- Title: Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer
of Scientific Concepts across Text Corpora
- Authors: Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang
Ren
- Abstract summary: We study translational research at the level of scientific concepts for all scientific fields.
We extract scientific concepts from corpora as instantiations of "research ideas"
We then follow the trajectories of over 450,000 new concepts to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.
- Score: 18.76916879679805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What kind of basic research ideas are more likely to get applied in practice?
There is a long line of research investigating patterns of knowledge transfer,
but it generally focuses on documents as the unit of analysis and follow their
transfer into practice for a specific scientific domain. Here we study
translational research at the level of scientific concepts for all scientific
fields. We do this through text mining and predictive modeling using three
corpora: 38.6 million paper abstracts, 4 million patent documents, and 0.28
million clinical trials. We extract scientific concepts (i.e., phrases) from
corpora as instantiations of "research ideas", create concept-level features as
motivated by literature, and then follow the trajectories of over 450,000 new
concepts (emerged from 1995-2014) to identify factors that lead only a small
proportion of these ideas to be used in inventions and drug trials. Results
from our analysis suggest several mechanisms that distinguish which scientific
concept will be adopted in practice, and which will not. We also demonstrate
that our derived features can be used to explain and predict knowledge transfer
with high accuracy. Our work provides greater understanding of knowledge
transfer for researchers, practitioners, and government agencies interested in
encouraging translational research.
Related papers
- Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently.
Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - Good Idea or Not, Representation of LLM Could Tell [86.36317971482755]
We focus on idea assessment, which aims to leverage the knowledge of large language models to assess the merit of scientific ideas.
We release a benchmark dataset from nearly four thousand manuscript papers with full texts, meticulously designed to train and evaluate the performance of different approaches to this task.
Our findings suggest that the representations of large language models hold more potential in quantifying the value of ideas than their generative outputs.
arXiv Detail & Related papers (2024-09-07T02:07:22Z) - SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - To think inside the box, or to think out of the box? Scientific
discovery via the reciprocation of insights and concepts [26.218943558900552]
We view scientific discovery as an interplay between $thinking out of the box$ that actively seeks insightful solutions.
We propose Mindle, a semantic searching game that triggers scientific-discovery-like thinking spontaneously.
On this basis, the meta-strategies for insights and the usage of concepts can be investigated reciprocally.
arXiv Detail & Related papers (2022-12-01T03:52:12Z) - Measure Utility, Gain Trust: Practical Advice for XAI Researcher [2.4756236418706483]
We recommend researchers focus on the utility of machine learning explanations instead of trust.
We outline five broad use cases where explanations are useful.
We describe pseudo-experiments that rely on objective empirical measurements and falsifiable hypotheses.
arXiv Detail & Related papers (2020-09-27T18:55:33Z) - High-Precision Extraction of Emerging Concepts from Scientific
Literature [29.56863792319201]
We present an unsupervised concept extraction method for scientific literature.
From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%.
arXiv Detail & Related papers (2020-06-11T23:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.