OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems
- URL: http://arxiv.org/abs/2503.21813v1
- Date: Tue, 25 Mar 2025 18:20:04 GMT
- Title: OAEI-LLM-T: A TBox Benchmark Dataset for Understanding LLM Hallucinations in Ontology Matching Systems
- Authors: Zhangcheng Qiang,
- Abstract summary: Hallucinations are inevitable in downstream tasks using large language models (LLMs)<n>We introduce a new benchmark dataset called OAEI-LLM-T, capturing hallucinations of different LLMs performing OM tasks.<n>These OM-specific hallucinations are carefully classified into two primary categories and six sub-categories.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucinations are inevitable in downstream tasks using large language models (LLMs). While addressing hallucinations becomes a substantial challenge for LLM-based ontology matching (OM) systems, we introduce a new benchmark dataset called OAEI-LLM-T. The dataset evolves from the TBox (i.e. schema-matching) datasets in the Ontology Alignment Evaluation Initiative (OAEI), capturing hallucinations of different LLMs performing OM tasks. These OM-specific hallucinations are carefully classified into two primary categories and six sub-categories. We showcase the usefulness of the dataset in constructing the LLM leaderboard and fine-tuning foundational LLMs for LLM-based OM systems.
Related papers
- How to Steer LLM Latents for Hallucination Detection? [29.967245405976072]
We propose a steering vector that reshapes the representation space during inference to separate truthful and hallucinated outputs.
Our two-stage framework first trains TSV on a small set of labeled exemplars to form compact and well-separated clusters.
It then augments the exemplar set with unlabeled LLM generations, employing an optimal transport-based algorithm for pseudo-labeling combined with a confidence-based filtering process.
arXiv Detail & Related papers (2025-03-01T19:19:34Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering [66.5524727179286]
NOVA is a framework designed to identify high-quality data that aligns well with the learned knowledge to reduce hallucinations.<n>It includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data.<n>To ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity.
arXiv Detail & Related papers (2025-02-11T08:05:56Z) - LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text.
LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z) - OAEI-LLM: A Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching [8.732396482276332]
Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM)<n>The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks.
arXiv Detail & Related papers (2024-09-21T06:49:34Z) - HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild [41.86776426516293]
Hallucinations pose a significant challenge to the reliability of large language models (LLMs) in critical domains.
We introduce HaluEval-Wild, the first benchmark specifically designed to evaluate LLM hallucinations in the wild.
arXiv Detail & Related papers (2024-03-07T08:25:46Z) - OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination
Detection with Weakly Supervised Data [1.3981625092173873]
This paper describes a unified system for hallucination detection of LLMs.
It wins the second prize in the model-agnostic track of the SemEval-2024 Task 6.
arXiv Detail & Related papers (2024-02-20T11:01:39Z) - Mitigating Object Hallucination in Large Vision-Language Models via
Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images.
We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE)
MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z) - CLAMP: Contrastive LAnguage Model Prompt-tuning [89.96914454453791]
We show that large language models can achieve good image classification performance when adapted this way.
Our approach beats state-of-the-art mLLMs by 13% and slightly outperforms contrastive learning with a custom text model.
arXiv Detail & Related papers (2023-12-04T05:13:59Z) - AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination
Evaluation [58.19101663976327]
Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations.
evaluating MLLMs' hallucinations is becoming increasingly important in model improvement and practical application deployment.
We propose an LLM-free multi-dimensional benchmark AMBER, which can be used to evaluate both generative task and discriminative task.
arXiv Detail & Related papers (2023-11-13T15:25:42Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.