Machine Learning-Friendly Biomedical Datasets for Equivalence and
Subsumption Ontology Matching
- URL: http://arxiv.org/abs/2205.03447v8
- Date: Sun, 23 Jul 2023 00:13:46 GMT
- Title: Machine Learning-Friendly Biomedical Datasets for Equivalence and
Subsumption Ontology Matching
- Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian,
Ian Horrocks
- Abstract summary: We introduce five new Ontology Matching (OM) tasks involving extracted from Mondo and UMLS.
Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation.
A comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems.
- Score: 35.76522395991403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ontology Matching (OM) plays an important role in many domains such as
bioinformatics and the Semantic Web, and its research is becoming increasingly
popular, especially with the application of machine learning (ML) techniques.
Although the Ontology Alignment Evaluation Initiative (OAEI) represents an
impressive effort for the systematic evaluation of OM systems, it still suffers
from several limitations including limited evaluation of subsumption mappings,
suboptimal reference mappings, and limited support for the evaluation of
ML-based systems. To tackle these limitations, we introduce five new biomedical
OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes
both equivalence and subsumption matching; the quality of reference mappings is
ensured by human curation, ontology pruning, etc.; and a comprehensive
evaluation framework is proposed to measure OM performance from various
perspectives for both ML-based and non-ML-based OM systems. We report
evaluation results for OM systems of different types to demonstrate the usage
of these resources, all of which are publicly available as part of the new
BioML track at OAEI 2022.
Related papers
- MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia.
In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models.
This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Surveying the MLLM Landscape: A Meta-Review of Current Surveys [17.372501468675303]
Multimodal Large Language Models (MLLMs) have become a transformative force in the field of artificial intelligence.
This survey aims to provide a systematic review of benchmark tests and evaluation methods for MLLMs.
arXiv Detail & Related papers (2024-09-17T14:35:38Z) - A Survey for Large Language Models in Biomedicine [31.719451674137844]
This review is based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv.
We explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine.
We discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics.
arXiv Detail & Related papers (2024-08-29T12:39:16Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Agent-OM: Leveraging LLM Agents for Ontology Matching [4.222245509121683]
This study introduces a novel agent-powered design paradigm for Ontology matching systems.
We propose a framework, namely Agent-OMw.r.t. Agent for Ontology Matching, consisting of two Siamese agents for matching and retrieval.
Our system can achieve results very close to the long-standing best performance on simple OM tasks.
arXiv Detail & Related papers (2023-12-01T03:44:54Z) - A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations [22.668383945059762]
We present a systematic evaluation of four representative Large Language Models (LLMs) across 12 BioNLP datasets.
The evaluation is conducted under four settings: zero-shot, static few-shot, dynamic K-nearest few-shot, and fine-tuning.
We compare these models against state-of-the-art (SOTA) approaches that fine-tune (domain-specific) BERT or BART models.
arXiv Detail & Related papers (2023-05-10T13:40:06Z) - Interpretability from a new lens: Integrating Stratification and Domain
knowledge for Biomedical Applications [0.0]
This paper proposes a novel computational strategy for the stratification of biomedical problem datasets into k-fold cross-validation (CVs)
This approach can improve model stability, establish trust, and provide explanations for outcomes generated by trained IML models.
arXiv Detail & Related papers (2023-03-15T12:02:02Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [56.5556523013924]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.