Multi-Perspective Semantic Information Retrieval in the Biomedical
Domain
- URL: http://arxiv.org/abs/2008.01526v1
- Date: Fri, 17 Jul 2020 21:05:44 GMT
- Title: Multi-Perspective Semantic Information Retrieval in the Biomedical
Domain
- Authors: Samarth Rawal
- Abstract summary: Information Retrieval (IR) is the task of obtaining pieces of data (such as documents) that are relevant to a particular query or need.
Modern neural approaches pose certain advantages compared to their classical counterparts.
This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information Retrieval (IR) is the task of obtaining pieces of data (such as
documents) that are relevant to a particular query or need from a large
repository of information. IR is a valuable component of several downstream
Natural Language Processing (NLP) tasks. Practically, IR is at the heart of
many widely-used technologies like search engines. While probabilistic ranking
functions like the Okapi BM25 function have been utilized in IR systems since
the 1970's, modern neural approaches pose certain advantages compared to their
classical counterparts. In particular, the release of BERT (Bidirectional
Encoder Representations from Transformers) has had a significant impact in the
NLP community by demonstrating how the use of a Masked Language Model trained
on a large corpus of data can improve a variety of downstream NLP tasks,
including sentence classification and passage re-ranking. IR Systems are also
important in the biomedical and clinical domains. Given the increasing amount
of scientific literature across biomedical domain, the ability find answers to
specific clinical queries from a repository of millions of articles is a matter
of practical value to medical professionals. Moreover, there are
domain-specific challenges present, including handling clinical jargon and
evaluating the similarity or relatedness of various medical symptoms when
determining the relevance between a query and a sentence. This work presents
contributions to several aspects of the Biomedical Semantic Information
Retrieval domain. First, it introduces Multi-Perspective Sentence Relevance, a
novel methodology of utilizing BERT-based models for contextual IR. The system
is evaluated using the BioASQ Biomedical IR Challenge. Finally, practical
contributions in the form of a live IR system for medics and a proposed
challenge on the Living Systematic Review clinical task are provided.
Related papers
- GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Experience and Evidence are the eyes of an excellent summarizer! Towards
Knowledge Infused Multi-modal Clinical Conversation Summarization [46.613541673040544]
We propose a knowledge-infused, multi-modal, multi-tasking medical domain identification and clinical conversation summary generation framework.
We develop a multi-modal, multi-intent clinical conversation summarization corpus annotated with intent, symptom, and summary.
The extensive set of experiments led to the following findings: (a) critical significance of visuals, (b) more precise and medical entity preserving summary with additional knowledge infusion, and (c) a correlation between medical department identification and clinical synopsis generation.
arXiv Detail & Related papers (2023-09-27T15:49:43Z) - MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed
Search Logs for Zero-shot Biomedical Information Retrieval [5.330363334603656]
We introduce MedCPT, a first-of-its-kindively Contrast Pre-trained Transformer model for zero-shot semantic IR in biomedicine.
To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed.
We show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks.
arXiv Detail & Related papers (2023-07-02T15:11:59Z) - Automatically Extracting Information in Medical Dialogue: Expert System
And Attention for Labelling [0.0]
Expert System and Attention for Labelling (ESAL) is a novel model for retrieving features from medical records.
We use mixture of experts and pre-trained BERT to retrieve the semantics of different categories.
In our experiment, ESAL significantly improved the performance of Medical Information Classification.
arXiv Detail & Related papers (2022-11-28T16:49:13Z) - Improving Biomedical Information Retrieval with Neural Retrievers [30.778569849542837]
We propose a template-based question generation method that can be leveraged to train neural retriever models.
Second, we develop two novel pre-training tasks that are closely aligned to the downstream task of information retrieval.
Third, we introduce the Poly-DPR'' model which encodes each context into multiple context vectors.
arXiv Detail & Related papers (2022-01-19T17:36:54Z) - Network Module Detection from Multi-Modal Node Features with a Greedy
Decision Forest for Actionable Explainable AI [0.0]
In this work, we demonstrate subnetwork detection based on multi-modal node features using a new Greedy Decision Forest.
Our glass-box approach could help to uncover disease-causing network modules from multi-omics data to better understand diseases such as cancer.
arXiv Detail & Related papers (2021-08-26T09:42:44Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - A Systematic Review of Natural Language Processing Applied to Radiology
Reports [3.600747505433814]
This study systematically assesses recent literature in NLP applied to radiology reports.
Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.
arXiv Detail & Related papers (2021-02-18T18:54:41Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.