Related papers: An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task

An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task

URL: http://arxiv.org/abs/2104.08340v1
Date: Fri, 16 Apr 2021 19:45:27 GMT
Title: An Analysis of a BERT Deep Learning Strategy on a Technology Assisted Review Task
Authors: Alexandros Ioannidis
Abstract summary: Document screening is a central task within Evidenced Based Medicine. I propose a DL document classification approach with BERT or PubMedBERT embeddings and a DL similarity search path. I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections.
Score: 91.3755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document screening is a central task within Evidenced Based Medicine, which is a clinical discipline that supplements scientific proof to back medical decisions. Given the recent advances in DL (Deep Learning) methods applied to Information Retrieval tasks, I propose a DL document classification approach with BERT or PubMedBERT embeddings and a DL similarity search path using SBERT embeddings to reduce physicians' tasks of screening and classifying immense amounts of documents to answer clinical queries. I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections. I find that the proposed DL strategy works, I compare it to the recently successful BM25 plus RM3 model, and conclude that the suggested method accomplishes advanced retrieval performance in the initial ranking of the articles with the aforementioned datasets, for the CLEF eHealth Technologically Assisted Reviews in Empirical Medicine Task.

Related papers

Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue. SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query. We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
Zero-Shot Medical Information Retrieval via Knowledge Graph Embedding [27.14794371879541]
This paper introduces MedFusionRank, a novel approach to zero-shot medical information retrieval (MIR) The proposed approach leverages a pre-trained BERT-style model to extract compact yet informative keywords. These keywords are then enriched with domain knowledge by linking them to conceptual entities within a medical knowledge graph.
arXiv Detail & Related papers (2023-10-31T16:26:33Z)
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation [53.77226503675752]
The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. In this paper, we explore alternative sources of queries for prioritising screening, such as the Boolean query used to retrieve the documents to be screened and queries generated by instruction-based large-scale language models such as ChatGPT and Alpaca. Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.
arXiv Detail & Related papers (2023-09-11T05:12:14Z)
Empowering Language Model with Guided Knowledge Fusion for Biomedical Document Re-ranking [22.23809978012414]
Pre-trained language models (PLMs) have proven to be effective for document re-ranking task. We propose an approach that integrates knowledge and the PLMs to guide the model toward effectively capturing information from external sources.
arXiv Detail & Related papers (2023-05-07T17:45:47Z)
A Survey for Biomedical Text Summarization: From Pre-trained to Large Language Models [21.516351027053705]
We present a systematic review of recent advancements in biomedical text summarization. We discuss existing challenges and promising future directions in the era of large language models. To facilitate the research community, we line up open resources including available datasets, recent approaches, codes, evaluation metrics, and the leaderboard in a public project.
arXiv Detail & Related papers (2023-04-18T06:38:40Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning. We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z)
An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.