Ontology Enrichment from Texts: A Biomedical Dataset for Concept
Discovery and Placement
- URL: http://arxiv.org/abs/2306.14704v3
- Date: Fri, 1 Sep 2023 15:26:45 GMT
- Title: Ontology Enrichment from Texts: A Biomedical Dataset for Concept
Discovery and Placement
- Authors: Hang Dong, Jiaoyan Chen, Yuan He, Ian Horrocks
- Abstract summary: Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases.
Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery.
We provide usage on the evaluation with the dataset for out-of-KB mention discovery and concept placement, recent Large Language Model based methods.
- Score: 22.074094839360413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mentions of new concepts appear regularly in texts and require automated
approaches to harvest and place them into Knowledge Bases (KB), e.g.,
ontologies and taxonomies. Existing datasets suffer from three issues, (i)
mostly assuming that a new concept is pre-discovered and cannot support
out-of-KB mention discovery; (ii) only using the concept label as the input
along with the KB and thus lacking the contexts of a concept label; and (iii)
mostly focusing on concept placement w.r.t a taxonomy of atomic concepts,
instead of complex concepts, i.e., with logical operators. To address these
issues, we propose a new benchmark, adapting MedMentions dataset (PubMed
abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases
sub-category and the broader categories of Clinical finding, Procedure, and
Pharmaceutical / biologic product. We provide usage on the evaluation with the
dataset for out-of-KB mention discovery and concept placement, adapting recent
Large Language Model based methods.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - From Large Language Models to Knowledge Graphs for Biomarker Discovery
in Cancer [0.9437165725355702]
A challenging scenarios for artificial intelligence (AI) is using biomedical data to provide diagnosis and treatment recommendations for cancerous conditions.
A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations.
In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA.
arXiv Detail & Related papers (2023-10-12T14:36:13Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Self-Supervised Detection of Contextual Synonyms in a Multi-Class
Setting: Phenotype Annotation Use Case [11.912581294872767]
Contextualised word embeddings is a powerful tool to detect contextual synonyms.
We propose a self-supervised pre-training approach which is able to detect contextual synonyms of concepts being training on the data created by shallow matching.
arXiv Detail & Related papers (2021-09-04T21:35:01Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z) - Low Resource Recognition and Linking of Biomedical Concepts from a Large
Ontology [30.324906836652367]
PubMed, the most well known database of biomedical papers, relies on human curators to add these annotations.
Our approach achieves new state-of-the-art results for the UMLS in both traditional recognition/linking and semantic indexing-based evaluation.
arXiv Detail & Related papers (2021-01-26T06:41:12Z) - Drug and Disease Interpretation Learning with Biomedical Entity
Representation Transformer [9.152161078854146]
Concept normalization in free-form texts is a crucial step in every text-mining pipeline.
We propose a simple and effective two-stage neural approach based on fine-tuned BERT architectures.
arXiv Detail & Related papers (2021-01-22T20:01:25Z) - Biomedical Concept Relatedness -- A large EHR-based benchmark [10.133874724214984]
A promising application of AI to healthcare is the retrieval of information from electronic health records.
The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores.
All existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs.
We open-source a novel concept relatedness benchmark overcoming these issues.
arXiv Detail & Related papers (2020-10-30T12:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.