PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical
Knowledge
- URL: http://arxiv.org/abs/2401.11048v1
- Date: Fri, 19 Jan 2024 22:24:39 GMT
- Title: PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical
Knowledge
- Authors: Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian,
Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, and Zhiyong Lu
- Abstract summary: PubTator 3.0 is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches.
It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles.
We show that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results.
- Score: 7.483612362757038
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a
biomedical literature resource using state-of-the-art AI techniques to offer
semantic and relation searches for key concepts like proteins, genetic
variants, diseases, and chemicals. It currently provides over one billion
entity and relation annotations across approximately 36 million PubMed
abstracts and 6 million full-text articles from the PMC open access subset,
updated weekly. PubTator 3.0's online interface and API utilize these
precomputed entity relations and synonyms to provide advanced search
capabilities and enable large-scale analyses, streamlining many complex
information needs. We showcase the retrieval quality of PubTator 3.0 using a
series of entity pair queries, demonstrating that PubTator 3.0 retrieves a
greater number of articles than either PubMed or Google Scholar, with higher
precision in the top 20 results. We further show that integrating ChatGPT
(GPT-4) with PubTator APIs dramatically improves the factuality and
verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive
set of features and tools that allow researchers to navigate the ever-expanding
wealth of biomedical literature, expediting research and unlocking valuable
insights for scientific discovery.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - PubMed and Beyond: Biomedical Literature Search in the Age of Artificial
Intelligence [6.10182662240717]
literature search is an essential tool for building on prior knowledge in clinical and biomedical research.
Recent improvements in artificial intelligence have expanded functionality beyond keyword-based search.
We present a survey of literature search tools tailored to both general and specific information needs in biomedicine.
arXiv Detail & Related papers (2023-07-18T23:35:53Z) - PGB: A PubMed Graph Benchmark for Heterogeneous Network Representation
Learning [5.747361083768407]
We introduce PubMed Graph Benchmark (PGB), a new benchmark for evaluating heterogeneous graph embeddings for biomedical literature.
The benchmark contains rich metadata including abstract authors, citations, MeSH hierarchy, MeSH hierarchy and other information.
arXiv Detail & Related papers (2023-05-04T10:09:08Z) - The Semantic Reader Project: Augmenting Scholarly Documents through
AI-Powered Interactive Reading Interfaces [54.2590226904332]
We describe the Semantic Reader Project, a effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers.
Ten prototype interfaces have been developed and more than 300 participants and real-world users have shown improved reading experiences.
We structure this paper around challenges scholars and the public face when reading research papers.
arXiv Detail & Related papers (2023-03-25T02:47:09Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - Low Resource Recognition and Linking of Biomedical Concepts from a Large
Ontology [30.324906836652367]
PubMed, the most well known database of biomedical papers, relies on human curators to add these annotations.
Our approach achieves new state-of-the-art results for the UMLS in both traditional recognition/linking and semantic indexing-based evaluation.
arXiv Detail & Related papers (2021-01-26T06:41:12Z) - PubSqueezer: A Text-Mining Web Tool to Transform Unstructured Documents
into Structured Data [0.0]
I present a web tool which uses a Text Mining strategy to transform unstructured biomedical articles into structured data.
generated results give a quick overview on complex topics which can possibly suggest not explicitly reported information.
I show how a literature based analysis conducted with PubSqueezer results allows to describe known facts about SARS-CoV-2.
arXiv Detail & Related papers (2020-11-05T22:23:18Z) - Literature Triage on Genomic Variation Publications by
Knowledge-enhanced Multi-channel CNN [5.187865216685969]
The aim of this study is to investigate the correlation between genomic variation and certain diseases or phenotypes.
We adopt a multi-channel convolutional network to utilize rich textual information and bridge the semantic gaps from different corpora.
Our model improves the accuracy of biomedical literature triage results.
arXiv Detail & Related papers (2020-05-08T13:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.