Mirror Matching: Document Matching Approach in Seed-driven Document
Ranking for Medical Systematic Reviews
- URL: http://arxiv.org/abs/2112.14318v1
- Date: Tue, 28 Dec 2021 22:27:52 GMT
- Title: Mirror Matching: Document Matching Approach in Seed-driven Document
Ranking for Medical Systematic Reviews
- Authors: Grace E. Lee and Aixin Sun
- Abstract summary: Document ranking is an approach for assisting researchers by providing document rankings where relevant documents are ranked higher than irrelevant ones.
We propose a document matching measure named Mirror Matching, which calculates matching scores between medical abstract texts by incorporating common writing patterns.
- Score: 31.3220495275256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When medical researchers conduct a systematic review (SR), screening studies
is the most time-consuming process: researchers read several thousands of
medical literature and manually label them relevant or irrelevant. Screening
prioritization (ie., document ranking) is an approach for assisting researchers
by providing document rankings where relevant documents are ranked higher than
irrelevant ones. Seed-driven document ranking (SDR) uses a known relevant
document (ie., seed) as a query and generates such rankings. Previous work on
SDR seeks ways to identify different term weights in a query document and
utilizes them in a retrieval model to compute ranking scores. Alternatively, we
formulate the SDR task as finding similar documents to a query document and
produce rankings based on similarity scores. We propose a document matching
measure named Mirror Matching, which calculates matching scores between medical
abstract texts by incorporating common writing patterns, such as background,
method, result, and conclusion in order. We conduct experiments on CLEF 2019
eHealth Task 2 TAR dataset, and the empirical results show this simple approach
achieves the higher performance than traditional and neural retrieval models on
Average Precision and Precision-focused metrics.
Related papers
- AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue.
SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.
We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Generating Natural Language Queries for More Effective Systematic Review
Screening Prioritisation [53.77226503675752]
The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers.
In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca.
Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
arXiv Detail & Related papers (2023-09-11T05:12:14Z) - Neural Rankers for Effective Screening Prioritisation in Medical
Systematic Review Literature Search [31.797257552928336]
We apply several pre-trained language models to the systematic review document ranking task.
An empirical analysis compares how effective neural methods compare to traditional methods for this task.
Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods.
arXiv Detail & Related papers (2022-12-18T05:26:40Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - Towards Reducing Manual Workload in Technology-Assisted Reviews:
Estimating Ranking Performance [30.29371206568408]
When researchers label studies, they can screen ranked documents where relevant documents are higher than irrelevant ones.
This paper investigates the quality of document ranking of systematic reviews.
After extensive analysis on SR document rankings, we hypothesize 'topic broadness' as a factor that affects the ranking quality of SR.
arXiv Detail & Related papers (2022-01-14T19:48:45Z) - An Analysis of a BERT Deep Learning Strategy on a Technology Assisted
Review Task [91.3755431537592]
Document screening is a central task within Evidenced Based Medicine.
I propose a DL document classification approach with BERT or PubMedBERT embeddings and a DL similarity search path.
I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections.
arXiv Detail & Related papers (2021-04-16T19:45:27Z) - Literature Retrieval for Precision Medicine with Neural Matching and
Faceted Summarization [2.978663539080876]
We present a document reranking approach that combines neural query-document matching and text summarization.
Evaluations using NIST's TREC-PM track datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-17T02:01:32Z) - Aspect-based Document Similarity for Research Papers [4.661692753666685]
We extend similarity with aspect information by performing a pairwise document classification task.
We evaluate our aspect-based document similarity for research papers.
Our results show SciBERT as the best performing system.
arXiv Detail & Related papers (2020-10-13T13:51:21Z) - Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers.
This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.