Related papers: Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models

URL: http://arxiv.org/abs/2408.12326v1
Date: Thu, 22 Aug 2024 12:04:04 GMT
Title: Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models
Authors: Meiyun Wang, Masahiro Suzuki, Hiroki Sakaji, Kiyoshi Izumi,
Abstract summary: Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. These models can produce hallucinations, particularly in domains with incomplete knowledge. We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
Score: 7.632217365130212
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. Given the high costs of creating annotated datasets for supervised learning, LLMs offer a valuable alternative by enabling effective few-shot in-context learning. However, these models can produce hallucinations, particularly in domains with incomplete knowledge. Additionally, current methods for knowledge distillation using LLMs often struggle to enhance the effectiveness of both teacher and student models. To address these challenges, we introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models during knowledge distillation. DualChecker employs ContextAligner to ensure that the context provided by teacher models aligns with human labeling standards. It also features a dynamic checker system that enhances model interaction: one component re-prompts teacher models with more detailed content when they show low confidence, and another identifies borderline cases from student models to refine the teaching templates. This interactive process promotes continuous improvement and effective knowledge transfer between the models. We evaluate DualChecker using a green innovation textual dataset that includes binary, multiclass, and token classification tasks. The experimental results show that DualChecker significantly outperforms existing state-of-the-art methods, achieving up to a 17% improvement in F1 score for teacher models and 10% for student models. Notably, student models fine-tuned with LLM predictions perform comparably to those fine-tuned with actual data, even in a challenging domain. We make all datasets, models, and code from this research publicly available.

Related papers

CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation [57.91828170220308]
We propose a knowledge distillation approach, CustomKD, that effectively leverages large vision foundation models (LVFMs) to enhance the performance of edge models. Our simple yet effective CustomKD customizes the well-generalized features inherent in LVFMs to a given student model in order to reduce model discrepancies.
arXiv Detail & Related papers (2025-03-23T23:53:08Z)
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs [58.4911494598431]
DistiLLM-2 is a contrastive approach that simultaneously increases the likelihood of teacher responses and decreases that of student responses. Our experiments show that DistiLLM-2 not only builds high-performing student models across a wide range of tasks, but also supports diverse applications.
arXiv Detail & Related papers (2025-03-10T08:51:32Z)
Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence [18.640219880439062]
This paper presents an innovative approach to leverage intermediate spatial representations. We propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models.
arXiv Detail & Related papers (2025-03-09T16:32:25Z)
An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations. We propose a novel clustering-based active learning framework, enhanced with knowledge distillation. We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z)
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model [49.587821411012705]
We propose ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model. It distills the knowledge from a large teacher CLIP model into a smaller student model, ensuring comparable performance with significantly reduced parameters. EduAttention explores the cross-relationships between text features extracted by the teacher model and image features extracted by the student model.
arXiv Detail & Related papers (2024-08-08T01:12:21Z)
Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models [17.25135606956287]
Competitive Multi-modal Distillation framework (CoMD) captures bidirectional feedback between teacher and student models. Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model.
arXiv Detail & Related papers (2023-11-14T14:49:46Z)
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models. Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
Ensemble knowledge distillation of self-supervised speech models [84.69577440755457]
Distilled self-supervised models have shown competitive performance and efficiency in recent years. We performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. Our method improves the performance of the distilled models on four downstream speech processing tasks.
arXiv Detail & Related papers (2023-02-24T17:15:39Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Learning Slice-Aware Representations with Mixture of Attentions [38.74444452556773]
This work extends the recent slice-based learning (SBL)citechen 2019slice with a mixture of attentions (MoA) to learn slice-aware attentive dual representations. We empirically show that the MoA approach outperforms the baseline method as well as the original SBL approach on monitored slices with two natural language understanding tasks.
arXiv Detail & Related papers (2021-06-04T09:22:24Z)
Distill on the Go: Online knowledge distillation in self-supervised learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models. We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation. Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression. Current methods assign a fixed weight to a teacher model in the whole distillation. Most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.