Related papers: NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification

NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification

URL: http://arxiv.org/abs/2007.01030v1
Date: Thu, 2 Jul 2020 11:30:32 GMT
Title: NLNDE: The Neither-Language-Nor-Domain-Experts' Way of Spanish Medical Document De-Identification
Authors: Lukas Lange, Heike Adel, Jannik Str\"otgen
Abstract summary: We describe our NLNDE system, with which we participated in the MEDDOCAN competition. We address the task of detecting and classifying protected health information from Spanish data. Despite dealing in a non-standard language and domain setting, the NLNDE system achieves promising results in the competition.
Score: 11.98821166621488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural language processing has huge potential in the medical domain which recently led to a lot of research in this field. However, a prerequisite of secure processing of medical documents, e.g., patient notes and clinical trials, is the proper de-identification of privacy-sensitive information. In this paper, we describe our NLNDE system, with which we participated in the MEDDOCAN competition, the medical document anonymization task of IberLEF 2019. We address the task of detecting and classifying protected health information from Spanish data as a sequence-labeling problem and investigate different embedding methods for our neural network. Despite dealing in a non-standard language and domain setting, the NLNDE system achieves promising results in the competition.

Related papers

LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations [23.437292621092823]
We introduce MediTOD, a dataset of doctor-patient dialogues in English for the medical history-taking task. We devise a questionnaire-based labeling scheme tailored to the medical domain. Then, medical professionals create the dataset with high-quality comprehensive annotations.
arXiv Detail & Related papers (2024-10-18T06:38:22Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish [39.81302995670643]
This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking. It is based on a SapBERT-based bi-encoder and subsequent re-ranking with a cross-encoder, trained by following a contrastive-learning strategy to be tailored to medical concepts in Spanish.
arXiv Detail & Related papers (2024-04-09T15:04:27Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario [0.05277024349608833]
This work presents biomedical and clinical language models for Spanish by experimenting with different pretraining choices. In the absence of enough clinical data to train a model from scratch, we applied mixed-domain pretraining and cross-domain transfer approaches to generate a performant bio-clinical model.
arXiv Detail & Related papers (2021-09-08T12:12:07Z)
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking [66.76141128555099]
We propose a novel cross-lingual biomedical entity linking task (XL-BEL) We first investigate the ability of standard knowledge-agnostic as well as knowledge-enhanced monolingual and multilingual LMs beyond the standard monolingual English BEL task. We then address the challenge of transferring domain-specific knowledge in resource-rich languages to resource-poor ones.
arXiv Detail & Related papers (2021-05-30T00:50:00Z)
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z)
NLNDE: Enhancing Neural Sequence Taggers with Attention and Noisy Channel for Robust Pharmacological Entity Detection [11.98821166621488]
We describe the system with which we participated in the first subtrack of the PharmaCoNER competition of the BioNLP Open Shared Tasks 2019. Our system achieves promising results, especially by combining the different techniques, and reaches up to 88.6% F1 in the competition.
arXiv Detail & Related papers (2020-07-02T11:17:16Z)
Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records [4.339510167603376]
We construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains.
arXiv Detail & Related papers (2020-01-16T09:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.