Generating Natural Language Queries for More Effective Systematic Review
Screening Prioritisation
- URL: http://arxiv.org/abs/2309.05238v3
- Date: Thu, 23 Nov 2023 05:25:59 GMT
- Title: Generating Natural Language Queries for More Effective Systematic Review
Screening Prioritisation
- Authors: Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido
Zuccon
- Abstract summary: The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers.
In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca.
Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
- Score: 53.77226503675752
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Screening prioritisation in medical systematic reviews aims to rank the set
of documents retrieved by complex Boolean queries. Prioritising the most
important documents ensures that subsequent review steps can be carried out
more efficiently and effectively. The current state of the art uses the final
title of the review as a query to rank the documents using BERT-based neural
rankers. However, the final title is only formulated at the end of the review
process, which makes this approach impractical as it relies on ex post facto
information. At the time of screening, only a rough working title is available,
with which the BERT-based ranker performs significantly worse than with the
final title. In this paper, we explore alternative sources of queries for
prioritising screening, such as the Boolean query used to retrieve the
documents to be screened and queries generated by instruction-based generative
large-scale language models such as ChatGPT and Alpaca. Our best approach is
not only viable based on the information available at the time of screening,
but also has similar effectiveness to the final title.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation [28.80089773616623]
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review.
Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation.
We propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation.
arXiv Detail & Related papers (2024-06-30T09:25:42Z) - Fine-Grained Distillation for Long Document Retrieval [86.39802110609062]
Long document retrieval aims to fetch query-relevant documents from a large-scale collection.
Knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
We propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers.
arXiv Detail & Related papers (2022-12-20T17:00:36Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - Neural Rankers for Effective Screening Prioritisation in Medical
Systematic Review Literature Search [31.797257552928336]
We apply several pre-trained language models to the systematic review document ranking task.
An empirical analysis compares how effective neural methods compare to traditional methods for this task.
Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods.
arXiv Detail & Related papers (2022-12-18T05:26:40Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - CODER: An efficient framework for improving retrieval through
COntextualized Document Embedding Reranking [11.635294568328625]
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost.
It utilizes precomputed document representations extracted by a base dense retrieval method.
It incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method.
arXiv Detail & Related papers (2021-12-16T10:25:26Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z) - Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system.
It generates a ranked list of quotable paragraphs and spans of tokens from a given source document.
We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.