Deep forecasting of translational impact in medical research
- URL: http://arxiv.org/abs/2110.08904v1
- Date: Sun, 17 Oct 2021 19:29:41 GMT
- Title: Deep forecasting of translational impact in medical research
- Authors: Amy PK Nelson, Robert J Gray, James K Ruffle, Henry C Watkins, Daniel
Herron, Nick Sorros, Danil Mikhailov, M. Jorge Cardoso, Sebastien Ourselin,
Nick McNally, Bryan Williams, Geraint E. Rees and Parashkev Nachev
- Abstract summary: We develop a suite of representational and discriminative mathematical models of multi-scale publication data.
We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents.
We argue that content-based models of impact are superior in performance to conventional, citation-based measures.
- Score: 1.8130872753848115
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The value of biomedical research--a $1.7 trillion annual investment--is
ultimately determined by its downstream, real-world impact. Current objective
predictors of impact rest on proxy, reductive metrics of dissemination, such as
paper citation rates, whose relation to real-world translation remains
unquantified. Here we sought to determine the comparative predictability of
future real-world translation--as indexed by inclusion in patents, guidelines
or policy documents--from complex models of the abstract-level content of
biomedical publications versus citations and publication meta-data alone. We
develop a suite of representational and discriminative mathematical models of
multi-scale publication data, quantifying predictive performance out-of-sample,
ahead-of-time, across major biomedical domains, using the entire corpus of
biomedical research captured by Microsoft Academic Graph from 1990 to 2019,
encompassing 43.3 million papers across all domains. We show that citations are
only moderately predictive of translational impact as judged by inclusion in
patents, guidelines, or policy documents. By contrast, high-dimensional models
of publication titles, abstracts and metadata exhibit high fidelity (AUROC >
0.9), generalise across time and thematic domain, and transfer to the task of
recognising papers of Nobel Laureates. The translational impact of a paper
indexed by inclusion in patents, guidelines, or policy documents can be
predicted--out-of-sample and ahead-of-time--with substantially higher fidelity
from complex models of its abstract-level content than from models of
publication meta-data or citation metrics. We argue that content-based models
of impact are superior in performance to conventional, citation-based measures,
and sustain a stronger evidence-based claim to the objective measurement of
translational potential.
Related papers
- From Words to Worth: Newborn Article Impact Prediction with LLM [69.41680520058418]
This paper introduces a promising approach, leveraging the capabilities of fine-tuned LLMs to predict the future impact of newborn articles.
A comprehensive dataset has been constructed and released for fine-tuning the LLM, containing over 12,000 entries with corresponding titles, abstracts, and TNCSI_SP.
arXiv Detail & Related papers (2024-08-07T17:52:02Z) - Machine Learning to Promote Translational Research: Predicting Patent
and Clinical Trial Inclusion in Dementia Research [0.0]
Projected to impact 1.6 million people in the UK by 2040 and costing pounds25 billion annually, dementia presents a growing challenge to society.
We used the Dimensions database to extract data from 43,091 UK dementia research publications between the years 1990-2023.
For patent predictions, an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.84 and 77.17% accuracy; for clinical trial predictions, an AUROC of 0.81 and 75.11% accuracy.
arXiv Detail & Related papers (2024-01-10T13:25:49Z) - P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models [57.571395694391654]
We find that existing approaches alter the political opinions and stances of news articles in more than 50% of summaries.
We propose P3SUM, a diffusion model-based summarization approach controlled by political perspective classifiers.
Experiments on three news summarization datasets demonstrate that P3SUM outperforms state-of-the-art summarization systems.
arXiv Detail & Related papers (2023-11-16T10:14:28Z) - Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement [11.684776349325887]
Existing models typically rely on static graphs for citation count estimation.
We introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values.
arXiv Detail & Related papers (2023-11-15T07:21:11Z) - Application of targeted maximum likelihood estimation in public health
and epidemiological studies: a systematic review [0.0]
The Targeted Maximum Likelihood Estimation framework integrates machine learning, statistical theory, and statistical inference.
We conduct a systematic literature review in PubMed for articles that applied any form of TMLE in observational studies.
Of the 81 publications included, 25% originated from the University of California at Berkeley, where the framework was first developed.
arXiv Detail & Related papers (2023-03-13T17:50:03Z) - Citation Trajectory Prediction via Publication Influence Representation
Using Temporal Knowledge Graph [52.07771598974385]
Existing approaches mainly rely on mining temporal and graph data from academic articles.
Our framework is composed of three modules: difference-preserved graph embedding, fine-grained influence representation, and learning-based trajectory calculation.
Experiments are conducted on both the APS academic dataset and our contributed AIPatent dataset.
arXiv Detail & Related papers (2022-10-02T07:43:26Z) - Entity-driven Fact-aware Abstractive Summarization of Biomedical
Literature [3.977582258550673]
We propose an entity-driven fact-aware framework for training end-to-end transformer-based encoder-decoder models for abstractive summarization of biomedical articles.
We conduct experiments using five state-of-the-art transformer-based models.
The proposed approach is evaluated on ICD-11-Summ-1000, and PubMed-50k.
arXiv Detail & Related papers (2022-03-30T00:34:56Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.