Hallucination-Free Automatic Question & Answer Generation for Intuitive Learning
- URL: http://arxiv.org/abs/2601.14280v1
- Date: Tue, 13 Jan 2026 03:57:56 GMT
- Title: Hallucination-Free Automatic Question & Answer Generation for Intuitive Learning
- Authors: Nicholas X. Wang, Aggelos K. Katsaggelos,
- Abstract summary: Hallucinations in large language models (LLMs) pose a challenge to the automatic generation of educational multiple-choice questions (MCQs)<n>We propose a hallucination-free multi-agent generation framework that breaks down MCQ generation into discrete, verifiable stages.<n>Our results demonstrate that structured multi-agent collaboration can mitigate hallucinations in educational content creation at scale.
- Score: 10.54305202989537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucinations in large language models (LLMs), defined as fluent yet incorrect or incoherent outputs, pose a significant challenge to the automatic generation of educational multiple-choice questions (MCQs). We identified four key hallucination types in MCQ generation: reasoning inconsistencies, insolvability, factual errors, and mathematical errors. To address this, we propose a hallucination-free multi-agent generation framework that breaks down MCQ generation into discrete, verifiable stages. Our framework utilizes both rule-based and LLM-based detection agents, as well as hallucination scoring metrics to optimize question quality. We redefined MCQ generation as an optimization task minimizing hallucination risk while maximizing validity, answerability, and cost-efficiency. We also introduce an agent-led refinement process that uses counterfactual reasoning and chain-of-thought (CoT) to iteratively improve hallucination in question generation. We evaluated a sample of AP- aligned STEM questions, where our system reduced hallucination rates by over 90% compared to baseline generation while preserving the educational value and style of questions. Our results demonstrate that structured multi-agent collaboration can mitigate hallucinations in educational content creation at scale, paving the way for more reliable LLM-powered learning tools.
Related papers
- MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z) - HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z) - Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking [124.69672273754144]
HaluSearch is a novel framework that incorporates tree search-based algorithms.<n>It frames text generation as a step-by-step reasoning process.<n>We introduce a hierarchical thinking system switch mechanism inspired by the dual process theory in cognitive science.
arXiv Detail & Related papers (2025-01-02T15:36:50Z) - FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning [18.927164579769066]
Existing approaches primarily detect the presence of hallucinations but lack a nuanced understanding of their types and manifestations.<n>We introduce a comprehensive taxonomy that categorizes the common hallucinations in mathematical reasoning tasks into six types.<n>We then propose FG-PRM, an augmented model designed to detect and mitigate hallucinations in a fine-grained, step-level manner.
arXiv Detail & Related papers (2024-10-08T19:25:26Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.