CARE: Extracting Experimental Findings From Clinical Literature
- URL: http://arxiv.org/abs/2311.09736v2
- Date: Wed, 24 Apr 2024 18:15:36 GMT
- Title: CARE: Extracting Experimental Findings From Clinical Literature
- Authors: Aakanksha Naik, Bailey Kuehl, Erin Bransom, Doug Downey, Tom Hope,
- Abstract summary: This work presents CARE, a new IE dataset for the task of extracting clinical findings.
We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes.
We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports.
- Score: 29.763929941107616
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity and nuance required. Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings. We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes, which unifies phenomena challenging for current IE systems such as discontinuous entity spans, nested relations, variable arity n-ary relations and numeric results in a single schema. We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports. We also demonstrate the generalizability of our schema to the computer science and materials science domains. We benchmark state-of-the-art IE systems on CARE, showing that even models such as GPT4 struggle. We release our resources to advance research on extracting and aggregating literature findings.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Decoding MIE: A Novel Dataset Approach Using Topic Extraction and Affiliation Parsing [0.0]
This study introduces a novel dataset derived from the Medical Informatics Europe (MIE) Conference proceedings.
We extracted and processed metadata and abstract from 4,606 articles published in the "Studies in Health Technology and Informatics" journal series.
arXiv Detail & Related papers (2024-10-06T19:34:23Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - PcMSP: A Dataset for Scientific Action Graphs Extraction from
Polycrystalline Materials Synthesis Procedure Text [1.9573380763700712]
This dataset simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations.
A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus.
We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations.
arXiv Detail & Related papers (2022-10-22T09:43:54Z) - ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug
Discovery [79.08833067391093]
Real-world pharmaceutical datasets often exhibit highly imbalanced distribution.
We introduce ImDrug, a benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning.
It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline.
arXiv Detail & Related papers (2022-09-16T13:35:57Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - CREATe: Clinical Report Extraction and Annotation Technology [53.731999072534876]
Clinical case reports are written descriptions of the unique aspects of a particular clinical case.
There has been no attempt to develop an end-to-end system to annotate, index, or otherwise curate these reports.
We propose a novel computational resource platform, CREATe, for extracting, indexing, and querying the contents of clinical case reports.
arXiv Detail & Related papers (2021-02-28T16:50:14Z) - Self-Supervised Learning for Visual Summary Identification in Scientific
Publications [21.26121265868308]
We create a new benchmark dataset for selecting figures to serve as visual summaries of publications based on their abstracts.
We develop a self-supervised learning approach, based on matching of inline references to figures with figure captions.
Experiments in both biomedical and computer science domains show that our model is able to outperform the state of the art.
arXiv Detail & Related papers (2020-12-21T09:48:58Z) - The SOFC-Exp Corpus and Neural Approaches to Information Extraction in
the Materials Science Domain [11.085048329202335]
We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications.
A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition.
We present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set.
arXiv Detail & Related papers (2020-06-04T17:49:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.