Empowering Language Model with Guided Knowledge Fusion for Biomedical
Document Re-ranking
- URL: http://arxiv.org/abs/2305.04344v1
- Date: Sun, 7 May 2023 17:45:47 GMT
- Title: Empowering Language Model with Guided Knowledge Fusion for Biomedical
Document Re-ranking
- Authors: Deepak Gupta and Dina Demner-Fushman
- Abstract summary: Pre-trained language models (PLMs) have proven to be effective for document re-ranking task.
We propose an approach that integrates knowledge and the PLMs to guide the model toward effectively capturing information from external sources.
- Score: 22.23809978012414
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained language models (PLMs) have proven to be effective for document
re-ranking task. However, they lack the ability to fully interpret the
semantics of biomedical and health-care queries and often rely on simplistic
patterns for retrieving documents. To address this challenge, we propose an
approach that integrates knowledge and the PLMs to guide the model toward
effectively capturing information from external sources and retrieving the
correct documents. We performed comprehensive experiments on two biomedical and
open-domain datasets that show that our approach significantly improves vanilla
PLMs and other existing approaches for document re-ranking task.
Related papers
- NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 [0.0]
We present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5.
Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios.
arXiv Detail & Related papers (2024-09-17T15:37:56Z) - Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting [59.97247234955861]
We introduce a novel framework based on large language models (LLMs) that combines a progressive prompting algorithm with a dual-agent system, named LLM-Duo.
Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain.
arXiv Detail & Related papers (2024-08-20T16:42:23Z) - BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine [19.861178160437827]
Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains.
textscBiomedRAG attains superior performance across 5 biomedical NLP tasks.
textscBiomedRAG outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.
arXiv Detail & Related papers (2024-05-01T12:01:39Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - An Analysis of a BERT Deep Learning Strategy on a Technology Assisted
Review Task [91.3755431537592]
Document screening is a central task within Evidenced Based Medicine.
I propose a DL document classification approach with BERT or PubMedBERT embeddings and a DL similarity search path.
I test and evaluate the retrieval effectiveness of my DL strategy on the 2017 and 2018 CLEF eHealth collections.
arXiv Detail & Related papers (2021-04-16T19:45:27Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.