Related papers: Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

URL: http://arxiv.org/abs/2505.22517v1
Date: Wed, 28 May 2025 16:03:41 GMT
Title: Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Authors: Yimeng Gu, Zhao Tong, Ignacio Castro, Shu Wu, Gareth Tyson,
Abstract summary: Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context.<n>We introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM.<n>In Stage 1, we apply LoRA fine-tuning to the student model using all training data.<n>In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict.
Score: 17.41734069411864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context. Many existing works have leveraged multimodal large language models (MLLMs) for detecting out-of-context news. However, observing the limited zero-shot performance of smaller MLLMs, they generally require label-rich fine-tuning and/or expensive API calls to GPT models to improve the performance, which is impractical in low-resource scenarios. In contrast, we aim to improve the performance of small MLLMs in a more label-efficient and cost-effective manner. To this end, we first prompt multiple teacher MLLMs to generate both label predictions and corresponding rationales, which collectively serve as the teachers' knowledge. We then introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM. In Stage 1, we apply LoRA fine-tuning to the student model using all training data. In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict. This two-stage strategy reduces annotation costs and helps the student model uncover subtle patterns in more challenging cases. Experimental results demonstrate that our approach achieves state-of-the-art performance using less than 10% labeled data.

Related papers

Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs [3.8834950760134657]
Knowledge tracing (KT) models are small, domain-specific, temporal models trained on student question-response data.<n>We show that KT models outperform Large Language Models (LLMs) with respect to accuracy and F1 scores on this domain-specific task.<n>This highlights the importance of domain-specific models for education prediction tasks.
arXiv Detail & Related papers (2026-03-03T10:25:52Z)
Can LLM Annotations Replace User Clicks for Learning to Rank? [112.2254432364736]
Large-scale supervised data is essential for training modern ranking models, but obtaining high-quality human annotations is costly.<n>Click data has been widely used as a low-cost alternative, and with recent advances in large language models (LLMs), LLM-based relevance annotation has emerged as another promising annotation.<n> Experiments on both a public dataset, TianGong-ST, and an industrial dataset, Baidu-Click, show that click-supervised models perform better on high-frequency queries.<n>We explore two training strategies -- data scheduling and frequency-aware multi-objective learning -- that integrate both supervision signals.
arXiv Detail & Related papers (2025-11-10T02:26:14Z)
Online In-Context Distillation for Low-Resource Vision Language Models [16.3054668860198]
Small vision-language models (VLMs) are efficient but typically require costly fine-tuning to close the performance gap with larger models in the deployment domain.<n>We propose an online In-Context Distillation (ICD) method, in which a small VLM collaborates with a stronger teacher model at inference time.<n>Our method significantly boosts the performance of small models (up to 33%) using scarce teacher annotations.
arXiv Detail & Related papers (2025-10-20T21:35:17Z)
Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models [19.361686225381447]
Vision-language models (VLMs) are widely assumed to exhibit in-context learning (ICL)<n>We propose a new MM-ICL with Reasoning pipeline that augments each demonstration with a generated rationale alongside the answer.
arXiv Detail & Related papers (2025-06-09T16:55:32Z)
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.<n>LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.<n>Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z)
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation [41.05687297326706]
LLaVA-MoD is a framework designed to enable the efficient training of small-scale Multimodal Language Models. We optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts architecture into the language model. We also propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration.
arXiv Detail & Related papers (2024-08-28T15:52:23Z)
Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models [7.632217365130212]
Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks. These models can produce hallucinations, particularly in domains with incomplete knowledge. We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
arXiv Detail & Related papers (2024-08-22T12:04:04Z)
LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch. Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process. By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.<n>Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.<n>We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z)
Identifying Factual Inconsistencies in Summaries: Grounding LLM Inference via Task Taxonomy [48.29181662640212]
Factual inconsistencies pose a significant hurdle for the faithful summarization by generative models. We consolidate key error types of inconsistent facts in summaries, and incorporate them to facilitate both the zero-shot and supervised paradigms of LLMs.
arXiv Detail & Related papers (2024-02-20T08:41:23Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation [60.657065293413716]
We study semi-supervised sequence generation tasks, where the few labeled examples are too scarce to finetune a model. We present the discovery that a student model distilled from a few-shot prompted large language models can commonly generalize better than its teacher.
arXiv Detail & Related papers (2023-11-15T01:28:28Z)
LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models. We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.