Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer
of Scientific Concepts across Text Corpora
- URL: http://arxiv.org/abs/2010.06657v1
- Date: Tue, 13 Oct 2020 19:46:59 GMT
- Title: Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer
of Scientific Concepts across Text Corpora
- Authors: Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang
Ren
- Abstract summary: We study translational research at the level of scientific concepts for all scientific fields.
We extract scientific concepts from corpora as instantiations of "research ideas"
We then follow the trajectories of over 450,000 new concepts to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.
- Score: 18.76916879679805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What kind of basic research ideas are more likely to get applied in practice?
There is a long line of research investigating patterns of knowledge transfer,
but it generally focuses on documents as the unit of analysis and follow their
transfer into practice for a specific scientific domain. Here we study
translational research at the level of scientific concepts for all scientific
fields. We do this through text mining and predictive modeling using three
corpora: 38.6 million paper abstracts, 4 million patent documents, and 0.28
million clinical trials. We extract scientific concepts (i.e., phrases) from
corpora as instantiations of "research ideas", create concept-level features as
motivated by literature, and then follow the trajectories of over 450,000 new
concepts (emerged from 1995-2014) to identify factors that lead only a small
proportion of these ideas to be used in inventions and drug trials. Results
from our analysis suggest several mechanisms that distinguish which scientific
concept will be adopted in practice, and which will not. We also demonstrate
that our derived features can be used to explain and predict knowledge transfer
with high accuracy. Our work provides greater understanding of knowledge
transfer for researchers, practitioners, and government agencies interested in
encouraging translational research.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
We comprehensively survey over 250 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models [0.6906005491572401]
We introduce SciMuse, a system that uses an evolving knowledge graph built from more than 58 million scientific papers to generate personalized research ideas.
We find that data-efficient machine learning can predict research interest with high precision, allowing us to optimize the interest-level of generated research ideas.
arXiv Detail & Related papers (2024-05-27T11:00:51Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is a large language model-powered research idea writing agent.
It generates problems, methods, and experiment designs while iteratively refining them based on scientific literature.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension.
The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines.
As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z) - To think inside the box, or to think out of the box? Scientific
discovery via the reciprocation of insights and concepts [26.218943558900552]
We view scientific discovery as an interplay between $thinking out of the box$ that actively seeks insightful solutions.
We propose Mindle, a semantic searching game that triggers scientific-discovery-like thinking spontaneously.
On this basis, the meta-strategies for insights and the usage of concepts can be investigated reciprocally.
arXiv Detail & Related papers (2022-12-01T03:52:12Z) - Measure Utility, Gain Trust: Practical Advice for XAI Researcher [2.4756236418706483]
We recommend researchers focus on the utility of machine learning explanations instead of trust.
We outline five broad use cases where explanations are useful.
We describe pseudo-experiments that rely on objective empirical measurements and falsifiable hypotheses.
arXiv Detail & Related papers (2020-09-27T18:55:33Z) - High-Precision Extraction of Emerging Concepts from Scientific
Literature [29.56863792319201]
We present an unsupervised concept extraction method for scientific literature.
From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%.
arXiv Detail & Related papers (2020-06-11T23:48:27Z) - Optimal Learning for Sequential Decisions in Laboratory Experimentation [0.0]
This tutorial is aimed to provide experimental scientists with a foundation in the science of making decisions.
We introduce the concept of a learning policy, and review the major categories of policies.
We then introduce a policy, known as the knowledge gradient, that maximizes the value of information from each experiment.
arXiv Detail & Related papers (2020-04-11T14:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.