Multilingual Medical Question Answering and Information Retrieval for
Rural Health Intelligence Access
- URL: http://arxiv.org/abs/2106.01251v1
- Date: Wed, 2 Jun 2021 16:05:24 GMT
- Title: Multilingual Medical Question Answering and Information Retrieval for
Rural Health Intelligence Access
- Authors: Vishal Vinod, Susmit Agrawal, Vipul Gaurav, Pallavi R, Savita
Choudhary
- Abstract summary: In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable.
Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the supplanting of information in indigenous languages can be easily prevented.
We describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant.
- Score: 1.0499611180329804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In rural regions of several developing countries, access to quality
healthcare, medical infrastructure, and professional diagnosis is largely
unavailable. Many of these regions are gradually gaining access to internet
infrastructure, although not with a strong enough connection to allow for
sustained communication with a medical practitioner. Several deaths resulting
from this lack of medical access, absence of patient's previous health records,
and the unavailability of information in indigenous languages can be easily
prevented. In this paper, we describe an approach leveraging the phenomenal
progress in Machine Learning and NLP (Natural Language Processing) techniques
to design a model that is low-resource, multilingual, and a preliminary
first-point-of-contact medical assistant. Our contribution includes defining
the NLP pipeline required for named-entity-recognition, language-agnostic
sentence embedding, natural language translation, information retrieval,
question answering, and generative pre-training for final query processing. We
obtain promising results for this pipeline and preliminary results for EHR
(Electronic Health Record) analysis with text summarization for medical
practitioners to peruse for their diagnosis. Through this NLP pipeline, we aim
to provide preliminary medical information to the user and do not claim to
supplant diagnosis from qualified medical practitioners. Using the input from
subject matter experts, we have compiled a large corpus to pre-train and
fine-tune our BioBERT based NLP model for the specific tasks. We expect recent
advances in NLP architectures, several of which are efficient and
privacy-preserving models, to further the impact of our solution and improve on
individual task performance.
Related papers
- Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Medical Vision Language Pretraining: A survey [8.393439175704124]
Medical Vision Language Pretraining is a promising solution to the scarcity of labeled data in the medical domain.
By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations.
arXiv Detail & Related papers (2023-12-11T09:14:13Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - Leveraging A Medical Knowledge Graph into Large Language Models for
Diagnosis Prediction [7.5569033426158585]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation.
We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge.
Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language
Processing [5.022185333260402]
Diagnostic Reasoning Benchmarks, DR.BENCH, is a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability.
DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models.
arXiv Detail & Related papers (2022-09-29T16:05:53Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Neural Natural Language Processing for Unstructured Data in Electronic
Health Records: a Review [4.454501609622817]
Well over half of the information stored within EHRs is in the form of unstructured text.
Deep learning approaches to Natural Language Processing have made considerable advances.
We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics.
arXiv Detail & Related papers (2021-07-07T01:50:02Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language
Understanding Pretraining [5.807159674193696]
We present MeDAL, a large medical text dataset curated for abbreviation disambiguation.
We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
arXiv Detail & Related papers (2020-12-27T17:17:39Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.