A Cross-lingual Natural Language Processing Framework for Infodemic
Management
- URL: http://arxiv.org/abs/2010.16357v1
- Date: Fri, 30 Oct 2020 16:26:35 GMT
- Title: A Cross-lingual Natural Language Processing Framework for Infodemic
Management
- Authors: Ridam Pal, Rohan Pandey, Vaibhav Gautam, Kanav Bhagat, Tavpritesh
Sethi
- Abstract summary: The COVID-19 pandemic has put immense pressure on health systems which are further strained due to misinformation surrounding it.
We have exploited the potential of Natural Language Processing for identifying relevant information that needs to be disseminated amongst the masses.
We present a novel Cross-lingual Natural Language Processing framework to provide relevant information by matching daily news with trusted guidelines from the World Health Organization.
- Score: 0.6606016007748989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic has put immense pressure on health systems which are
further strained due to the misinformation surrounding it. Under such a
situation, providing the right information at the right time is crucial. There
is a growing demand for the management of information spread using Artificial
Intelligence. Hence, we have exploited the potential of Natural Language
Processing for identifying relevant information that needs to be disseminated
amongst the masses. In this work, we present a novel Cross-lingual Natural
Language Processing framework to provide relevant information by matching daily
news with trusted guidelines from the World Health Organization. The proposed
pipeline deploys various techniques of NLP such as summarizers, word
embeddings, and similarity metrics to provide users with news articles along
with a corresponding healthcare guideline. A total of 36 models were evaluated
and a combination of LexRank based summarizer on Word2Vec embedding with Word
Mover distance metric outperformed all other models. This novel open-source
approach can be used as a template for proactive dissemination of relevant
healthcare information in the midst of misinformation spread associated with
epidemics.
Related papers
- MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch [0.0]
We present a novel approach submitted to the MEDIQA-CORR 2024 shared task.
Our method emphasizes extracting contextually relevant information from available clinical text data.
By integrating domain expertise and prioritizing meaningful information extraction, our approach underscores the significance of a human-centric strategy in adapting AI for healthcare.
arXiv Detail & Related papers (2024-04-27T20:28:38Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Advancing Italian Biomedical Information Extraction with
Transformers-based Models: Methodological Insights and Multicenter Practical
Application [0.27027468002793437]
Information Extraction can help clinical practitioners overcome the limitation by using automated text-mining pipelines.
We created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Transformers-based model.
The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "low-resource" approach.
arXiv Detail & Related papers (2023-06-08T16:15:46Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with
Structured Semantics for Medical Text Mining [15.809776934712147]
We introduce SMedBERT, a medical PLM trained on large-scale medical corpora.
In SMedBERT, the mention-neighbor hybrid attention is proposed to learn heterogeneous-entity information.
Experiments demonstrate that SMedBERT significantly outperforms strong baselines in various knowledge-intensive Chinese medical tasks.
arXiv Detail & Related papers (2021-08-20T03:32:01Z) - Automated Lay Language Summarization of Biomedical Scientific Reviews [16.01452242066412]
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.
Medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret.
This paper introduces the novel task of automated generation of lay language summaries of biomedical scientific reviews.
arXiv Detail & Related papers (2020-12-23T10:01:18Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.