Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation
- URL: http://arxiv.org/abs/2412.15255v2
- Date: Wed, 04 Jun 2025 12:38:38 GMT
- Title: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation
- Authors: Jonibek Mansurov, Akhmed Sakip, Alham Fikri Aji,
- Abstract summary: We show that knowledge distillation can be subverted to manipulate language model benchmark scores.<n>We introduce "Data Laundering," a process that enables the covert transfer of benchmark-specific knowledge.<n>We show how this approach can achieve substantial improvements in benchmark accuracy without developing genuine reasoning capabilities.
- Score: 11.215746700797618
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we show that knowledge distillation can be subverted to manipulate language model benchmark scores, revealing a critical vulnerability in current evaluation practices. We introduce "Data Laundering," a process that enables the covert transfer of benchmark-specific knowledge through seemingly legitimate intermediate training steps. Through extensive experiments with a 2-layer BERT student model, we show how this approach can achieve substantial improvements in benchmark accuracy (up to 75\% on GPQA) without developing genuine reasoning capabilities. Notably, this method can be exploited intentionally or even unintentionally, as researchers may inadvertently adopt this method and inflate scores without realising the implications. While our findings demonstrate the effectiveness of this technique, we present them as a cautionary tale highlighting the urgent need for more robust evaluation methods in AI. This work aims to contribute to the ongoing discussion about evaluation integrity in AI development and the need for benchmarks that more accurately reflect true model capabilities. The code is available at https://github.com/mbzuai-nlp/data_laundering.
Related papers
- Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement [19.883973457999282]
Imitation Learning from Observation (IfO) offers a powerful way to learn behaviors at large-scale.<n>This paper investigates idealized scenarios with mostly bimodal-quality data distributions and introduces a method to learn from such data.<n>Our method adapts RL-based imitation learning to action-free demonstrations, using a value function to transfer information between expert and non-expert data.
arXiv Detail & Related papers (2025-07-09T09:55:23Z) - Query-Level Uncertainty in Large Language Models [13.195074492564332]
We introduce a novel and training-free method called emphInternal Confidence, which leverages self-evaluations across layers and tokens.<n> Empirical results on both factual QA and mathematical reasoning tasks demonstrate that our internal confidence can outperform several baselines.<n>Our proposed method can be used for efficient RAG and model cascading, which is able to reduce inference costs while maintaining performance.
arXiv Detail & Related papers (2025-06-11T12:39:48Z) - Active Learning Methods for Efficient Data Utilization and Model Performance Enhancement [5.4044723481768235]
This paper gives a detailed overview of Active Learning (AL), which is a strategy in machine learning that helps models achieve better performance using fewer labeled examples.<n>It introduces the basic concepts of AL and discusses how it is used in various fields such as computer vision, natural language processing, transfer learning, and real-world applications.
arXiv Detail & Related papers (2025-04-21T20:42:13Z) - OAL: Enhancing OOD Detection Using Latent Diffusion [5.357756138014614]
Outlier Aware Learning (OAL) framework synthesizes OOD training data directly in the latent space.
We introduce a mutual information-based contrastive learning approach that amplifies the distinction between In-Distribution (ID) and collected OOD features.
arXiv Detail & Related papers (2024-06-24T11:01:43Z) - Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection [47.0507287491627]
We propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection.
By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model.
Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources.
arXiv Detail & Related papers (2024-06-11T06:51:02Z) - Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task.
LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced.
Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Precise Benchmarking of Explainable AI Attribution Methods [0.0]
We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods.
Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations.
Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods.
arXiv Detail & Related papers (2023-08-06T17:03:32Z) - Knowledge Distillation via Token-level Relationship Graph [12.356770685214498]
We propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG)
By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model.
We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-20T08:16:37Z) - Out-of-Distribution Detection with Hilbert-Schmidt Independence
Optimization [114.43504951058796]
Outlier detection tasks have been playing a critical role in AI safety.
Deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence.
We propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks.
arXiv Detail & Related papers (2022-09-26T15:59:55Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - What Stops Learning-based 3D Registration from Working in the Real
World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions.
Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data.
Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z) - Efficient training of lightweight neural networks using Online
Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner.
We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Developing a Fidelity Evaluation Approach for Interpretable Machine
Learning [2.2448567386846916]
Explainable AI (XAI) methods are used to improve the interpretability of complex models.
In particular, methods to evaluate the fidelity of the explanation to the underlying black box require further development.
Our evaluations suggest that the internal mechanism of the underlying predictive model, the internal mechanism of the explainable method used and model and data complexity all affect explanation fidelity.
arXiv Detail & Related papers (2021-06-16T00:21:16Z) - Self-Supervised Relational Reasoning for Representation Learning [5.076419064097733]
In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on unlabeled data.
We propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data.
We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones.
arXiv Detail & Related papers (2020-06-10T14:24:25Z) - Universal Value Density Estimation for Imitation Learning and
Goal-Conditioned Reinforcement Learning [5.406386303264086]
In either case, effective solutions require the agent to reliably reach a specified state.
This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state.
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains.
As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
arXiv Detail & Related papers (2020-02-15T23:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.