Related papers: Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

URL: http://arxiv.org/abs/2010.06657v1
Date: Tue, 13 Oct 2020 19:46:59 GMT
Title: Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora
Authors: Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang Ren
Abstract summary: We study translational research at the level of scientific concepts for all scientific fields. We extract scientific concepts from corpora as instantiations of "research ideas" We then follow the trajectories of over 450,000 new concepts to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.
Score: 18.76916879679805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What kind of basic research ideas are more likely to get applied in practice? There is a long line of research investigating patterns of knowledge transfer, but it generally focuses on documents as the unit of analysis and follow their transfer into practice for a specific scientific domain. Here we study translational research at the level of scientific concepts for all scientific fields. We do this through text mining and predictive modeling using three corpora: 38.6 million paper abstracts, 4 million patent documents, and 0.28 million clinical trials. We extract scientific concepts (i.e., phrases) from corpora as instantiations of "research ideas", create concept-level features as motivated by literature, and then follow the trajectories of over 450,000 new concepts (emerged from 1995-2014) to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials. Results from our analysis suggest several mechanisms that distinguish which scientific concept will be adopted in practice, and which will not. We also demonstrate that our derived features can be used to explain and predict knowledge transfer with high accuracy. Our work provides greater understanding of knowledge transfer for researchers, practitioners, and government agencies interested in encouraging translational research.

Related papers

ScienceMeter: Tracking Scientific Knowledge Updates in Language Models [79.33626657942169]
Large Language Models (LLMs) are increasingly used to support scientific research, but their knowledge of scientific advancements can quickly become outdated.<n>We introduce ScienceMeter, a new framework for evaluating scientific knowledge update methods over scientific knowledge spanning the past, present, and future.
arXiv Detail & Related papers (2025-05-30T07:28:20Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation [48.29699224989952]
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. We show that this multi-agent approach outperforms the state-of-the-art method in producing novel and impactful scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z)
Good Idea or Not, Representation of LLM Could Tell [86.36317971482755]
We focus on idea assessment, which aims to leverage the knowledge of large language models to assess the merit of scientific ideas. We release a benchmark dataset from nearly four thousand manuscript papers with full texts, meticulously designed to train and evaluate the performance of different approaches to this task. Our findings suggest that the representations of large language models hold more potential in quantifying the value of ideas than their generative outputs.
arXiv Detail & Related papers (2024-09-07T02:07:22Z)
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery [14.465756130099091]
This paper presents the first comprehensive framework for fully automatic scientific discovery. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, and describes its findings. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community.
arXiv Detail & Related papers (2024-08-12T16:58:11Z)
SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection. The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders [0.6906005491572401]
We introduce SciMuse, which uses 58 million research papers and a large-language model to generate research ideas. We conduct a large-scale evaluation in which over 100 research group leaders ranked more than 4,400 personalized ideas based on their interest. This data allows us to predict research interest using (1) supervised neural networks trained on human evaluations, and (2) unsupervised zero-shot ranking with large-language models.
arXiv Detail & Related papers (2024-05-27T11:00:51Z)
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework. We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is a large language model-powered research idea writing agent. It generates problems, methods, and experiment designs while iteratively refining them based on scientific literature. We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
To think inside the box, or to think out of the box? Scientific discovery via the reciprocation of insights and concepts [26.218943558900552]
We view scientific discovery as an interplay between $thinking out of the box$ that actively seeks insightful solutions. We propose Mindle, a semantic searching game that triggers scientific-discovery-like thinking spontaneously. On this basis, the meta-strategies for insights and the usage of concepts can be investigated reciprocally.
arXiv Detail & Related papers (2022-12-01T03:52:12Z)
Measure Utility, Gain Trust: Practical Advice for XAI Researcher [2.4756236418706483]
We recommend researchers focus on the utility of machine learning explanations instead of trust. We outline five broad use cases where explanations are useful. We describe pseudo-experiments that rely on objective empirical measurements and falsifiable hypotheses.
arXiv Detail & Related papers (2020-09-27T18:55:33Z)
High-Precision Extraction of Emerging Concepts from Scientific Literature [29.56863792319201]
We present an unsupervised concept extraction method for scientific literature. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%.
arXiv Detail & Related papers (2020-06-11T23:48:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.