NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical
Document De-Identification
- URL: http://arxiv.org/abs/2007.01030v1
- Date: Thu, 2 Jul 2020 11:30:32 GMT
- Title: NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical
Document De-Identification
- Authors: Lukas Lange, Heike Adel, Jannik Str\"otgen
- Abstract summary: We describe our NLNDE system, with which we participated in the MEDDOCAN competition.
We address the task of detecting and classifying protected health information from Spanish data.
Despite dealing in a non-standard language and domain setting, the NLNDE system achieves promising results in the competition.
- Score: 11.98821166621488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language processing has huge potential in the medical domain which
recently led to a lot of research in this field. However, a prerequisite of
secure processing of medical documents, e.g., patient notes and clinical
trials, is the proper de-identification of privacy-sensitive information. In
this paper, we describe our NLNDE system, with which we participated in the
MEDDOCAN competition, the medical document anonymization task of IberLEF 2019.
We address the task of detecting and classifying protected health information
from Spanish data as a sequence-labeling problem and investigate different
embedding methods for our neural network. Despite dealing in a non-standard
language and domain setting, the NLNDE system achieves promising results in the
competition.
Related papers
- ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish [39.81302995670643]
This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking.
It is based on a SapBERT-based bi-encoder and subsequent re-ranking with a cross-encoder, trained by following a contrastive-learning strategy to be tailored to medical concepts in Spanish.
arXiv Detail & Related papers (2024-04-09T15:04:27Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Biomedical and Clinical Language Models for Spanish: On the Benefits of
Domain-Specific Pretraining in a Mid-Resource Scenario [0.05277024349608833]
This work presents biomedical and clinical language models for Spanish by experimenting with different pretraining choices.
In the absence of enough clinical data to train a model from scratch, we applied mixed-domain pretraining and cross-domain transfer approaches to generate a performant bio-clinical model.
arXiv Detail & Related papers (2021-09-08T12:12:07Z) - Multilingual Medical Question Answering and Information Retrieval for
Rural Health Intelligence Access [1.0499611180329804]
In rural regions of several developing countries, access to quality healthcare, medical infrastructure, and professional diagnosis is largely unavailable.
Several deaths resulting from this lack of medical access, absence of patient's previous health records, and the supplanting of information in indigenous languages can be easily prevented.
We describe an approach leveraging the phenomenal progress in Machine Learning and NLP (Natural Language Processing) techniques to design a model that is low-resource, multilingual, and a preliminary first-point-of-contact medical assistant.
arXiv Detail & Related papers (2021-06-02T16:05:24Z) - Learning Domain-Specialised Representations for Cross-Lingual Biomedical
Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL)
We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task.
We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Where's the Question? A Multi-channel Deep Convolutional Neural Network
for Question Identification in Textual Data [83.89578557287658]
We propose a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions.
We conducted a comprehensive performance comparison analysis of the proposed network against other deep neural networks.
The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.
arXiv Detail & Related papers (2020-10-15T15:11:22Z) - NLNDE: Enhancing Neural Sequence Taggers with Attention and Noisy
Channel for Robust Pharmacological Entity Detection [11.98821166621488]
We describe the system with which we participated in the first subtrack of the PharmaCoNER competition of the BioNLP Open Shared Tasks 2019.
Our system achieves promising results, especially by combining the different techniques, and reaches up to 88.6% F1 in the competition.
arXiv Detail & Related papers (2020-07-02T11:17:16Z) - Comparing Rule-based, Feature-based and Deep Neural Methods for
De-identification of Dutch Medical Records [4.339510167603376]
We construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare.
We test the generalizability of three de-identification methods across languages and domains.
arXiv Detail & Related papers (2020-01-16T09:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.