Harnessing PubMed User Query Logs for Post Hoc Explanations of
Recommended Similar Articles
- URL: http://arxiv.org/abs/2402.03484v1
- Date: Mon, 5 Feb 2024 19:56:27 GMT
- Title: Harnessing PubMed User Query Logs for Post Hoc Explanations of
Recommended Similar Articles
- Authors: Ashley Shin, Qiao Jin, James Anibal, Zhiyong Lu
- Abstract summary: We build PubCLogs by repurposing 5.6 million pairs of coclicked articles from PubMed's user query logs.
Using our PubCLogs dataset, we train the Highlight Similar Article Title (HSAT), a model designed to select the most relevant parts of the title of a similar article.
HSAT demonstrates strong performance in our empirical evaluations, achieving an F1 score of 91.72 percent on the PubCLogs test set.
- Score: 5.306261813981977
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Searching for a related article based on a reference article is an integral
part of scientific research. PubMed, like many academic search engines, has a
"similar articles" feature that recommends articles relevant to the current
article viewed by a user. Explaining recommended items can be of great utility
to users, particularly in the literature search process. With more than a
million biomedical papers being published each year, explaining the recommended
similar articles would facilitate researchers and clinicians in searching for
related articles. Nonetheless, the majority of current literature
recommendation systems lack explanations for their suggestions. We employ a
post hoc approach to explaining recommendations by identifying relevant tokens
in the titles of similar articles. Our major contribution is building PubCLogs
by repurposing 5.6 million pairs of coclicked articles from PubMed's user query
logs. Using our PubCLogs dataset, we train the Highlight Similar Article Title
(HSAT), a transformer-based model designed to select the most relevant parts of
the title of a similar article, based on the title and abstract of a seed
article. HSAT demonstrates strong performance in our empirical evaluations,
achieving an F1 score of 91.72 percent on the PubCLogs test set, considerably
outperforming several baselines including BM25 (70.62), MPNet (67.11), MedCPT
(62.22), GPT-3.5 (46.00), and GPT-4 (64.89). Additional evaluations on a
separate, manually annotated test set further verifies HSAT's performance.
Moreover, participants of our user study indicate a preference for HSAT, due to
its superior balance between conciseness and comprehensiveness. Our study
suggests that repurposing user query logs of academic search engines can be a
promising way to train state-of-the-art models for explaining literature
recommendation.
Related papers
- Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML)
We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions.
We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Generating Natural Language Queries for More Effective Systematic Review
Screening Prioritisation [53.77226503675752]
The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers.
In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca.
Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
arXiv Detail & Related papers (2023-09-11T05:12:14Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - MIReAD: Simple Method for Learning High-quality Representations from
Scientific Documents [77.34726150561087]
We propose MIReAD, a simple method that learns high-quality representations of scientific papers.
We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes.
arXiv Detail & Related papers (2023-05-07T03:29:55Z) - Artificial intelligence technologies to support research assessment: A
review [10.203602318836444]
This literature review identifies indicators that associate with higher impact or higher quality research from article text.
It includes studies that used machine learning techniques to predict citation counts or quality scores for journal articles or conference papers.
arXiv Detail & Related papers (2022-12-11T06:58:39Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - Pattern-based Acquisition of Scientific Entities from Scholarly Article
Titles [0.0]
We describe a rule-based approach for the automatic acquisition of scientific entities from scholarly article titles.
We identify a set of lexico-syntactic patterns that are easily recognizable.
A subset of the acquisition algorithm is implemented for article titles in the Computational Linguistics (CL) scholarly domain.
arXiv Detail & Related papers (2021-09-01T05:59:06Z) - Learning Fine-grained Fact-Article Correspondence in Legal Cases [19.606628325747938]
We create a corpus with manually annotated fact-article correspondences.
We parse articles in form of premise-conclusion pairs with random forest.
Our best system reaches an F1 score of 96.3%, making it of great potential for practical use.
arXiv Detail & Related papers (2021-04-21T19:06:58Z) - Ontology-Based Recommendation of Editorial Products [7.1717344176500335]
Smart Book Recommender (SBR) supports Springer Nature's Computer Science editorial team in selecting the products to market at specific venues.
SBR recommends books, journals, and conference proceedings relevant to a conference by taking advantage of a semantically enhanced representation of about 27K editorial products.
SBR also allows users to investigate why a certain publication was suggested by the system.
arXiv Detail & Related papers (2021-03-24T23:23:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.