Named Entities in Medical Case Reports: Corpus and Experiments
- URL: http://arxiv.org/abs/2003.13032v1
- Date: Sun, 29 Mar 2020 14:08:43 GMT
- Title: Named Entities in Medical Case Reports: Corpus and Experiments
- Authors: Sarah Schulz and Jurica \v{S}eva and Samuel Rodriguez and Malte
Ostendorff and Georg Rehm
- Abstract summary: We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library.
In the case reports, we annotate cases, conditions, findings, factors and negation modifier.
This is the first corpus of this kind made available to the scientific community in English.
- Score: 0.5773440045183915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new corpus comprising annotations of medical entities in case
reports, originating from PubMed Central's open access library. In the case
reports, we annotate cases, conditions, findings, factors and negation
modifiers. Moreover, where applicable, we annotate relations between these
entities. As such, this is the first corpus of this kind made available to the
scientific community in English. It enables the initial investigation of
automatic information extraction from case reports through tasks like Named
Entity Recognition, Relation Extraction and (sentence/paragraph) relevance
detection. Additionally, we present four strong baseline systems for the
detection of medical entities made available through the annotated dataset.
Related papers
- RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore)
RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions.
Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z) - "Nothing Abnormal": Disambiguating Medical Reports via Contrastive
Knowledge Infusion [6.9551174393701345]
We propose a rewriting algorithm based on contrastive pretraining and perturbation-based rewriting.
We create two datasets, OpenI-Annotated based on chest reports and VA-Annotated based on general medical reports.
Our proposed algorithm effectively rewrites input sentences in a less ambiguous way with high content fidelity.
arXiv Detail & Related papers (2023-05-15T02:01:20Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Automated tabulation of clinical trial results: A joint entity and
relation extraction approach with transformer-based language representations [5.825190876052148]
This paper investigates automating evidence table generation by decomposing the problem across two language processing tasks.
We focus on the automatic tabulation of sentences from published RCT abstracts that report the practice outcomes.
To train and test these models, a new gold-standard corpus was developed, comprising almost 600 result sentences from six disease areas.
arXiv Detail & Related papers (2021-12-10T15:26:43Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Writing by Memorizing: Hierarchical Retrieval-based Medical Report
Generation [26.134055930805523]
We propose MedWriter that incorporates a novel hierarchical retrieval mechanism to automatically extract both report and sentence-level templates.
MedWriter first employs the Visual-Language Retrieval(VLR) module to retrieve the most relevant reports for the given images.
To guarantee the logical coherence between sentences, the Language-Language Retrieval(LLR) module is introduced to retrieve relevant sentences.
At last, a language decoder fuses image features and features from retrieved reports and sentences to generate meaningful medical reports.
arXiv Detail & Related papers (2021-05-25T07:47:23Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z) - Extracting Structured Data from Physician-Patient Conversations By
Predicting Noteworthy Utterances [39.888619005843246]
We describe a new dataset consisting of conversation transcripts, post-visit summaries, corresponding supporting evidence (in the transcript), and structured labels.
One methodological challenge is that the conversations are long (around 1500 words) making it difficult for modern deep-learning models to use them as input.
We find that by first filtering for (predicted) noteworthy utterances, we can significantly boost predictive performance for recognizing both diagnoses and RoS abnormalities.
arXiv Detail & Related papers (2020-07-14T16:10:37Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - NUBES: A Corpus of Negation and Uncertainty in Spanish Clinical Texts [5.424799109837065]
This paper introduces the first version of the NUBes corpus (Negation and Uncertainty annotations in Biomedical texts in Spanish)
The corpus is part of an on-going research and currently consists of 29,682 sentences obtained from anonymised health records annotated with negation and uncertainty.
arXiv Detail & Related papers (2020-04-02T15:51:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.