Machine Learning-Friendly Biomedical Datasets for Equivalence and
Subsumption Ontology Matching
- URL: http://arxiv.org/abs/2205.03447v8
- Date: Sun, 23 Jul 2023 00:13:46 GMT
- Title: Machine Learning-Friendly Biomedical Datasets for Equivalence and
Subsumption Ontology Matching
- Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian,
Ian Horrocks
- Abstract summary: We introduce five new Ontology Matching (OM) tasks involving extracted from Mondo and UMLS.
Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation.
A comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems.
- Score: 35.76522395991403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ontology Matching (OM) plays an important role in many domains such as
bioinformatics and the Semantic Web, and its research is becoming increasingly
popular, especially with the application of machine learning (ML) techniques.
Although the Ontology Alignment Evaluation Initiative (OAEI) represents an
impressive effort for the systematic evaluation of OM systems, it still suffers
from several limitations including limited evaluation of subsumption mappings,
suboptimal reference mappings, and limited support for the evaluation of
ML-based systems. To tackle these limitations, we introduce five new biomedical
OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes
both equivalence and subsumption matching; the quality of reference mappings is
ensured by human curation, ontology pruning, etc.; and a comprehensive
evaluation framework is proposed to measure OM performance from various
perspectives for both ML-based and non-ML-based OM systems. We report
evaluation results for OM systems of different types to demonstrate the usage
of these resources, all of which are publicly available as part of the new
BioML track at OAEI 2022.
Related papers
- MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs)
MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts.
It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z) - Surveying the MLLM Landscape: A Meta-Review of Current Surveys [17.372501468675303]
Multimodal Large Language Models (MLLMs) have become a transformative force in the field of artificial intelligence.
This survey aims to provide a systematic review of benchmark tests and evaluation methods for MLLMs.
arXiv Detail & Related papers (2024-09-17T14:35:38Z) - A Survey for Large Language Models in Biomedicine [31.719451674137844]
This review is based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv.
We explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine.
We discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics.
arXiv Detail & Related papers (2024-08-29T12:39:16Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - Agent-OM: Leveraging LLM Agents for Ontology Matching [4.222245509121683]
This study introduces a novel agent-powered design paradigm for Ontology matching systems.
We propose a framework, namely Agent-OMw.r.t. Agent for Ontology Matching, consisting of two Siamese agents for matching and retrieval.
Our system can achieve results very close to the long-standing best performance on simple OM tasks.
arXiv Detail & Related papers (2023-12-01T03:44:54Z) - MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria [49.500322937449326]
Multimodal large language models (MLLMs) have broadened the scope of AI applications.
Existing automatic evaluation methodologies for MLLMs are mainly limited in evaluating queries without considering user experiences.
We propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge.
arXiv Detail & Related papers (2023-11-23T12:04:25Z) - A systematic evaluation of large language models for biomedical natural language processing: benchmarks, baselines, and recommendations [22.668383945059762]
We present a systematic evaluation of four representative Large Language Models (LLMs) across 12 BioNLP datasets.
The evaluation is conducted under four settings: zero-shot, static few-shot, dynamic K-nearest few-shot, and fine-tuning.
We compare these models against state-of-the-art (SOTA) approaches that fine-tune (domain-specific) BERT or BART models.
arXiv Detail & Related papers (2023-05-10T13:40:06Z) - Interpretability from a new lens: Integrating Stratification and Domain
knowledge for Biomedical Applications [0.0]
This paper proposes a novel computational strategy for the stratification of biomedical problem datasets into k-fold cross-validation (CVs)
This approach can improve model stability, establish trust, and provide explanations for outcomes generated by trained IML models.
arXiv Detail & Related papers (2023-03-15T12:02:02Z) - Information Extraction in Low-Resource Scenarios: Survey and Perspective [56.5556523013924]
Information Extraction seeks to derive structured information from unstructured texts.
This paper presents a review of neural approaches to low-resource IE from emphtraditional and emphLLM-based perspectives.
arXiv Detail & Related papers (2022-02-16T13:44:00Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.