High-Precision Extraction of Emerging Concepts from Scientific
Literature
- URL: http://arxiv.org/abs/2006.06877v1
- Date: Thu, 11 Jun 2020 23:48:27 GMT
- Title: High-Precision Extraction of Emerging Concepts from Scientific
Literature
- Authors: Daniel King, Doug Downey, Daniel S. Weld
- Abstract summary: We present an unsupervised concept extraction method for scientific literature.
From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%.
- Score: 29.56863792319201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identification of new concepts in scientific literature can help power
faceted search, scientific trend analysis, knowledge-base construction, and
more, but current methods are lacking. Manual identification cannot keep up
with the torrent of new publications, while the precision of existing automatic
techniques is too low for many applications. We present an unsupervised concept
extraction method for scientific literature that achieves much higher precision
than previous work. Our approach relies on a simple but novel intuition: each
scientific concept is likely to be introduced or popularized by a single paper
that is disproportionately cited by subsequent papers mentioning the concept.
From a corpus of computer science papers on arXiv, we find that our method
achieves a Precision@1000 of 99%, compared to 86% for prior work, and a
substantially better precision-yield trade-off across the top 15,000
extractions. To stimulate research in this area, we release our code and data
(https://github.com/allenai/ForeCite).
Related papers
- SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
We take a dramatic departure with a novel setting in which models use as input background contexts.
We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z) - MIReAD: Simple Method for Learning High-quality Representations from
Scientific Documents [77.34726150561087]
We propose MIReAD, a simple method that learns high-quality representations of scientific papers.
We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes.
arXiv Detail & Related papers (2023-05-07T03:29:55Z) - Cracking Double-Blind Review: Authorship Attribution with Deep Learning [43.483063713471935]
We propose a transformer-based, neural-network architecture to attribute an anonymous manuscript to an author.
We leverage all research papers publicly available on arXiv amounting to over 2 million manuscripts.
Our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly.
arXiv Detail & Related papers (2022-11-14T15:50:24Z) - arXivEdits: Understanding the Human Revision Process in Scientific
Writing [17.63505461444103]
We provide a complete computational framework for studying text revision in scientific writing.
We first introduce arXivEdits, a new annotated corpus of 751 full papers from arXiv with gold sentence alignment across their multiple versions of revision.
It supports our data-driven analysis to unveil the common strategies practiced by researchers for revising their papers.
arXiv Detail & Related papers (2022-10-26T22:50:24Z) - FPSRS: A Fusion Approach for Paper Submission Recommendation System [0.0]
This paper presents two newer approaches for recommending scientific articles.
The first approach employs RNN structures besides using Conv1D.
We also introduce a new method, namely DistilBertAims, using DistillBert for two cases of uppercase and lower-case words to vectorize features such as Title, Abstract, and Keywords.
The experimental results show that the second approach could obtain a better performance, which is 62.46% and 12.44% higher than the best of the previous study.
arXiv Detail & Related papers (2022-05-12T09:06:56Z) - Quantum verification and estimation with few copies [63.669642197519934]
The verification and estimation of large entangled systems represents one of the main challenges in the employment of such systems for reliable quantum information processing.
This review article presents novel techniques focusing on a fixed number of resources (sampling complexity) and thus prove suitable for systems of arbitrary dimension.
Specifically, a probabilistic framework requiring at best only a single copy for entanglement detection is reviewed, together with the concept of selective quantum state tomography.
arXiv Detail & Related papers (2021-09-08T18:20:07Z) - Linking Health News to Research Literature [12.80865601729801]
Accurately linking news articles to scientific research works is a critical component in a number of applications.
Although the lack of links between news and literature has been a challenge in these applications, it is a relatively unexplored research problem.
arXiv Detail & Related papers (2021-07-14T03:50:51Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer
of Scientific Concepts across Text Corpora [18.76916879679805]
We study translational research at the level of scientific concepts for all scientific fields.
We extract scientific concepts from corpora as instantiations of "research ideas"
We then follow the trajectories of over 450,000 new concepts to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.
arXiv Detail & Related papers (2020-10-13T19:46:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.