Debias the Black-box: A Fair Ranking Framework via Knowledge
Distillation
- URL: http://arxiv.org/abs/2208.11628v1
- Date: Wed, 24 Aug 2022 15:59:58 GMT
- Title: Debias the Black-box: A Fair Ranking Framework via Knowledge
Distillation
- Authors: Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao
- Abstract summary: We propose a fair information retrieval framework based on knowledge distillation.
This framework can improve the exposure-based fairness of models while considerably decreasing model size.
It also improves fairness performance by 15%46% while keeping a high level of recommendation effectiveness.
- Score: 26.60241524303918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks can capture the intricate interaction history
information between queries and documents, because of their many complicated
nonlinear units, allowing them to provide correct search recommendations.
However, service providers frequently face more complex obstacles in real-world
circumstances, such as deployment cost constraints and fairness requirements.
Knowledge distillation, which transfers the knowledge of a well-trained complex
model (teacher) to a simple model (student), has been proposed to alleviate the
former concern, but the best current distillation methods focus only on how to
make the student model imitate the predictions of the teacher model. To better
facilitate the application of deep models, we propose a fair information
retrieval framework based on knowledge distillation. This framework can improve
the exposure-based fairness of models while considerably decreasing model size.
Our extensive experiments on three huge datasets show that our proposed
framework can reduce the model size to a minimum of 1% of its original size
while maintaining its black-box state. It also improves fairness performance by
15%~46% while keeping a high level of recommendation effectiveness.
Related papers
- Faithful Label-free Knowledge Distillation [8.572967695281054]
This paper presents a label-free knowledge distillation approach called Teacher in the Middle (TinTeM)
It produces a more faithful student, which better replicates the behavior of the teacher network across a range of benchmarks testing model robustness, generalisability and out-of-distribution detection.
arXiv Detail & Related papers (2024-11-22T01:48:44Z) - On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models [7.062887337934677]
We propose that small models may not need to absorb the cost of pre-training to reap its benefits.
We observe that, when distilled on a task from a pre-trained model, a small model can achieve or surpass the performance it would achieve if it was pre-trained then finetuned on that task.
arXiv Detail & Related papers (2024-04-04T07:38:11Z) - Knowledge Distillation for Road Detection based on cross-model Semi-Supervised Learning [17.690698736544626]
We propose an integrated approach that combines knowledge distillation and semi-supervised learning methods.
This hybrid approach leverages the robust capabilities of large models to effectively utilise large unlabelled data.
The proposed semi-supervised learning-based knowledge distillation (SSLKD) approach demonstrates a notable improvement in the performance of the student model.
arXiv Detail & Related papers (2024-02-07T22:50:47Z) - Efficient Knowledge Distillation from Model Checkpoints [36.329429655242535]
We show that a weak snapshot ensemble of several intermediate models from a same training trajectory can outperform a strong ensemble of independently trained and fully converged models.
We propose an optimal intermediate teacher selection algorithm based on maximizing the total task-related mutual information.
arXiv Detail & Related papers (2022-10-12T17:55:30Z) - When in Doubt, Summon the Titans: Efficient Inference with Large Models [80.2673230098021]
We propose a two-stage framework based on distillation that realizes the modelling benefits of large models.
We use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples.
Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference.
arXiv Detail & Related papers (2021-10-19T22:56:49Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.