Related papers: Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access

Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access

URL: http://arxiv.org/abs/2106.01251v1
Date: Wed, 2 Jun 2021 16:05:24 GMT
Title: Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access
Authors: Vishal Vinod, Susmit Agrawal, Vipul Gaurav, Pallavi R, Savita Choudhary
Abstract summary: In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable. Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the supplanting of information in indigenous languages can be easily prevented. We describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant.
Score: 1.0499611180329804
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable. Many of these regions are gradually gaining access to internet infrastructure, although not with a strong enough connection to allow for sustained communication with a medical practitioner. Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the unavailability of information in indigenous languages can be easily prevented. In this paper, we describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant. Our contribution includes defining the NLP pipeline required for named-entity-recognition, language-agnostic sentence embedding, natural language translation, information retrieval, question answering, and generative pre-training for final query processing. We obtain promising results for this pipeline and preliminary results for EHR (Electronic Health Record) analysis with text summarization for medical practitioners to peruse for their diagnosis. Through this NLP pipeline, we aim to provide preliminary medical information to the user and do not claim to supplant diagnosis from qualified medical practitioners. Using the input from subject matter experts, we have compiled a large corpus to pre-train and fine-tune our BioBERT based NLP model for the specific tasks. We expect recent advances in NLP architectures, several of which are efficient and privacy-preserving models, to further the impact of our solution and improve on individual task performance.

Related papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning [57.873833577058]
We build a multimodal dataset enriched with extensive medical knowledge.<n>We then introduce our medical-specialized MLLM: Lingshu.<n>Lingshu undergoes multi-stage training to embed medical expertise and enhance its task-solving capabilities.
arXiv Detail & Related papers (2025-06-08T08:47:30Z)
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model [1.4843690728082002]
This study explores the enhancement of medical knowledge in a small language model by leveraging accessible online data.<n>We fine-tuned a baseline model using our curated data to improve its medical knowledge.<n> Benchmark evaluations demonstrate that the fine-tuned model achieves improved accuracy in medical question answering.
arXiv Detail & Related papers (2025-05-21T20:30:47Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Medical Vision Language Pretraining: A survey [8.393439175704124]
Medical Vision Language Pretraining is a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations.
arXiv Detail & Related papers (2023-12-11T09:14:13Z)
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain. ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF. We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z)
Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction [7.5569033426158585]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation. We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge. Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z)
Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning. We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z)
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing [5.022185333260402]
Diagnostic Reasoning Benchmarks, DR.BENCH, is a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models.
arXiv Detail & Related papers (2022-09-29T16:05:53Z)
Towards more patient friendly clinical notes through language models and ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling. We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review [4.454501609622817]
Well over half of the information stored within EHRs is in the form of unstructured text. Deep learning approaches to Natural Language Processing have made considerable advances. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics.
arXiv Detail & Related papers (2021-07-07T01:50:02Z)
CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark. It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification. We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z)
MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining [5.807159674193696]
We present MeDAL, a large medical text dataset curated for abbreviation disambiguation. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
arXiv Detail & Related papers (2020-12-27T17:17:39Z)
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.