Representation Learning with Conditional Information Flow Maximization
- URL: http://arxiv.org/abs/2406.05510v2
- Date: Mon, 12 Aug 2024 09:31:30 GMT
- Title: Representation Learning with Conditional Information Flow Maximization
- Authors: Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu,
- Abstract summary: This paper proposes an information-theoretic representation learning framework, named conditional information flow.
It promotes learned representations have good feature uniformity and sufficient predictive ability.
Experiments show that the learned representations are more sufficient, robust and transferable.
- Score: 29.36409607847339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) for the target task. Firstly, an information flow maximization principle is proposed to learn more sufficient representations for the input and target by simultaneously maximizing both input-representation and representation-label mutual information. Unlike the information bottleneck, we handle the input-representation information in an opposite way to avoid the over-compression issue of latent representations. Besides, to mitigate the negative effect of potential redundant features from the input, we design a conditional information minimization principle to eliminate negative redundant features while preserve noise-invariant features. Experiments on 13 language understanding benchmarks demonstrate that our method effectively improves the performance of PLMs for classification and regression. Extensive experiments show that the learned representations are more sufficient, robust and transferable.
Related papers
- Text Representation Distillation via Information Bottleneck Principle [22.63996326177594]
We propose a novel Knowledge Distillation method called IBKD.
It aims to maximize the mutual information between the final representation of the teacher and student model, while simultaneously reducing the mutual information between the student model's representation and the input data.
Empirical studies on two main downstream applications of text representation demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-11-09T16:04:17Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - Robust Representation Learning via Perceptual Similarity Metrics [18.842322467828502]
Contrastive Input Morphing (CIM) is a representation learning framework that learns input-space transformations of the data.
We show that CIM is complementary to other mutual information-based representation learning techniques.
arXiv Detail & Related papers (2021-06-11T21:45:44Z) - Conditional Contrastive Learning: Removing Undesirable Information in
Self-Supervised Representations [108.29288034509305]
We develop conditional contrastive learning to remove undesirable information in self-supervised representations.
We demonstrate empirically that our methods can successfully learn self-supervised representations for downstream tasks.
arXiv Detail & Related papers (2021-06-05T10:51:26Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Fairness by Learning Orthogonal Disentangled Representations [50.82638766862974]
We propose a novel disentanglement approach to invariant representation problem.
We enforce the meaningful representation to be agnostic to sensitive information by entropy.
The proposed approach is evaluated on five publicly available datasets.
arXiv Detail & Related papers (2020-03-12T11:09:15Z) - Learning Adversarially Robust Representations via Worst-Case Mutual
Information Maximization [15.087280646796527]
Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges.
We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions.
We propose an unsupervised learning method for obtaining intrinsically robust representations by maximizing the worst-case mutual information.
arXiv Detail & Related papers (2020-02-26T21:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.