Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms
- URL: http://arxiv.org/abs/2512.05967v1
- Date: Fri, 05 Dec 2025 18:59:18 GMT
- Title: Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms
- Authors: Francesco Granata, Francesco Poggi, Misael Mongiovì,
- Abstract summary: This study proposes an enhanced RAG architecture that integrates a factual signal derived from Entity Linking.<n>It implements three re-ranking strategies to combine semantic and entity-based information: a hybrid score weighting model, reciprocal rank fusion, and a cross-encoder re-ranker.<n>Results show that, in domain-specific contexts, the hybrid schema based on reciprocal rank fusion significantly outperforms both the baseline and the cross-encoder approach.
- Score: 1.7842332554022695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable knowledge sources. Despite their impressive effectiveness in many areas, RAG systems based solely on semantic similarity often fail to ensure factual accuracy in specialized domains, where terminological ambiguity can affect retrieval relevance. This study proposes an enhanced RAG architecture that integrates a factual signal derived from Entity Linking to improve the accuracy of educational question-answering systems in Italian. The system includes a Wikidata-based Entity Linking module and implements three re-ranking strategies to combine semantic and entity-based information: a hybrid score weighting model, reciprocal rank fusion, and a cross-encoder re-ranker. Experiments were conducted on two benchmarks: a custom academic dataset and the standard SQuAD-it dataset. Results show that, in domain-specific contexts, the hybrid schema based on reciprocal rank fusion significantly outperforms both the baseline and the cross-encoder approach, while the cross-encoder achieves the best results on the general-domain dataset. These findings confirm the presence of an effect of domain mismatch and highlight the importance of domain adaptation and hybrid ranking strategies to enhance factual precision and reliability in retrieval-augmented generation. They also demonstrate the potential of entity-aware RAG systems in educational environments, fostering adaptive and reliable AI-based tutoring tools.
Related papers
- CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation [10.159901538172575]
In practical deep learning deployment, the scarcity of data and the imbalance of label distributions often lead to semantically uncovered regions.<n>We propose a Cluster-conditioned Interpolative and Extrapolative framework for Geometry-Aware and Domain-aligned data augmentation (CIEGAD)<n>We show that CIEGAD effectively extends the periphery of real-world data distributions while maintaining high alignment between generated and real-world data as well as semantic diversity.
arXiv Detail & Related papers (2025-12-11T00:32:37Z) - RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG [0.0]
Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence.<n>Existing evaluation frameworks often rely on metrics that fail to capture domain-specific nuances.<n>This paper introduces RAGalyst, an automated, human-aligned agentic framework designed for the rigorous evaluation of domain-specific RAG systems.
arXiv Detail & Related papers (2025-11-06T16:22:52Z) - Domain-Specific Data Generation Framework for RAG Adaptation [58.20906914537952]
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models with external retrieval to enable domain-grounded responses.<n>We propose RAGen, a framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches.
arXiv Detail & Related papers (2025-10-13T09:59:49Z) - SynthGenNet: a self-supervised approach for test-time generalization using synthetic multi-source domain mixing of street view images [8.23277995673829]
We introduce SynthGenNet, a self-supervised student-teacher architecture to enable robust test-time domain generalization.<n>Our contributions include the novel ClassMix++ algorithm, which blends labeled data from various synthetic sources.<n>We show our model outperforms the state-of-the-art (relying on single source) by achieving 50% Mean Intersection-Over-Union (mIoU) value on real-world datasets.
arXiv Detail & Related papers (2025-09-02T13:08:03Z) - SemRAG: Semantic Knowledge-Augmented RAG for Improved Question-Answering [2.4874078867686085]
SemRAG is an enhanced Retrieval Augmented Generation (RAG) framework that efficiently integrates domain-specific knowledge.<n>It employs a semantic chunking algorithm that segments documents based on the cosine similarity from sentence embeddings, preserving semantic coherence.<n>By structuring retrieved information into knowledge graphs, SemRAG captures relationships between entities, improving retrieval accuracy and contextual understanding.
arXiv Detail & Related papers (2025-07-10T11:56:25Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection [75.79507634008631]
We introduce So-Fake-Set, a social media-oriented dataset with over 2 million high-quality images, diverse generative sources, and imagery synthesized using 35 state-of-the-art generative models.<n>We present So-Fake-R1, an advanced vision-language framework that employs reinforcement learning for highly accurate forgery detection, precise localization, and explainable inference through interpretable visual rationales.
arXiv Detail & Related papers (2025-05-24T11:53:35Z) - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation [52.3707788779464]
We introduce a novel Jensen-Shannon Divergence driven method to Attribute Response to Context (ARC-JSD)<n>ARC-JSD enables efficient and accurate identification of essential context sentences without additional fine-tuning, gradient-calculation or surrogate modelling.<n> Evaluations on a wide range of RAG benchmarks, such as TyDi QA, Hotpot QA, and Musique, using instruction-tuned LLMs in different scales demonstrate superior accuracy and significant computational efficiency improvements.
arXiv Detail & Related papers (2025-05-22T09:04:03Z) - HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.<n>It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z) - Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution [61.80716438091887]
GenDiE (Generate, Discriminate, Evolve) is a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization.<n>By treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches.<n>Experiments on ASQA (in-domain LFQA) and ConFiQA datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness.
arXiv Detail & Related papers (2025-03-03T16:08:33Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.