Text Representation Distillation via Information Bottleneck Principle
- URL: http://arxiv.org/abs/2311.05472v1
- Date: Thu, 9 Nov 2023 16:04:17 GMT
- Title: Text Representation Distillation via Information Bottleneck Principle
- Authors: Yanzhao Zhang, Dingkun Long, Zehan Li, Pengjun Xie
- Abstract summary: We propose a novel Knowledge Distillation method called IBKD.
It aims to maximize the mutual information between the final representation of the teacher and student model, while simultaneously reducing the mutual information between the student model's representation and the input data.
Empirical studies on two main downstream applications of text representation demonstrate the effectiveness of our proposed approach.
- Score: 22.63996326177594
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained language models (PLMs) have recently shown great success in text
representation field. However, the high computational cost and high-dimensional
representation of PLMs pose significant challenges for practical applications.
To make models more accessible, an effective method is to distill large models
into smaller representation models. In order to relieve the issue of
performance degradation after distillation, we propose a novel Knowledge
Distillation method called IBKD. This approach is motivated by the Information
Bottleneck principle and aims to maximize the mutual information between the
final representation of the teacher and student model, while simultaneously
reducing the mutual information between the student model's representation and
the input data. This enables the student model to preserve important learned
information while avoiding unnecessary information, thus reducing the risk of
over-fitting. Empirical studies on two main downstream applications of text
representation (Semantic Textual Similarity and Dense Retrieval tasks)
demonstrate the effectiveness of our proposed approach.
Related papers
- An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations.
We propose a novel clustering-based active learning framework, enhanced with knowledge distillation.
We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z) - Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification [49.41632476658246]
We discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets.
The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts.
We propose three novel Prompt Diversification methods to encourage image synthesis with diverse styles.
arXiv Detail & Related papers (2024-07-21T13:26:30Z) - Factual Dialogue Summarization via Learning from Large Language Models [35.63037083806503]
Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries.
We employ zero-shot learning to extract symbolic knowledge from LLMs, generating factually consistent (positive) and inconsistent (negative) summaries.
Our approach achieves better factual consistency while maintaining coherence, fluency, and relevance, as confirmed by various automatic evaluation metrics.
arXiv Detail & Related papers (2024-06-20T20:03:37Z) - Representation Learning with Conditional Information Flow Maximization [29.36409607847339]
This paper proposes an information-theoretic representation learning framework, named conditional information flow.
It promotes learned representations have good feature uniformity and sufficient predictive ability.
Experiments show that the learned representations are more sufficient, robust and transferable.
arXiv Detail & Related papers (2024-06-08T16:19:18Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Distilling Large Vision-Language Model with Out-of-Distribution
Generalizability [43.984177729641615]
This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models.
We propose several metrics and conduct extensive experiments to investigate their techniques.
The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification.
arXiv Detail & Related papers (2023-07-06T17:05:26Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - MOOCRep: A Unified Pre-trained Embedding of MOOC Entities [4.0963355240233446]
We propose to learn pre-trained representations of MOOC entities using abundant unlabeled data from the structure of MOOCs.
Our experiments reveal that MOOCRep's embeddings outperform state-of-the-art representation learning methods on two tasks important for education community.
arXiv Detail & Related papers (2021-07-12T00:11:25Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - High-Fidelity Synthesis with Disentangled Representation [60.19657080953252]
We propose an Information-Distillation Generative Adrial Network (ID-GAN) for disentanglement learning and high-fidelity synthesis.
Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis.
Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation.
arXiv Detail & Related papers (2020-01-13T14:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.