Learning Diverse and Discriminative Representations via the Principle of
Maximal Coding Rate Reduction
- URL: http://arxiv.org/abs/2006.08558v1
- Date: Mon, 15 Jun 2020 17:23:55 GMT
- Title: Learning Diverse and Discriminative Representations via the Principle of
Maximal Coding Rate Reduction
- Authors: Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, Yi Ma
- Abstract summary: We propose the principle of Maximal Coding Rate Reduction ($textMCR2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.
We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features.
- Score: 32.21975128854042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To learn intrinsic low-dimensional structures from high-dimensional data that
most discriminate between classes, we propose the principle of Maximal Coding
Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that
maximizes the coding rate difference between the whole dataset and the sum of
each individual class. We clarify its relationships with most existing
frameworks such as cross-entropy, information bottleneck, information gain,
contractive and contrastive learning, and provide theoretical guarantees for
learning diverse and discriminative features. The coding rate can be accurately
computed from finite samples of degenerate subspace-like distributions and can
learn intrinsic representations in supervised, self-supervised, and
unsupervised settings in a unified manner. Empirically, the representations
learned using this principle alone are significantly more robust to label
corruptions in classification than those using cross-entropy, and can lead to
state-of-the-art results in clustering mixed data from self-learned invariant
features.
Related papers
- Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - On Interpretable Approaches to Cluster, Classify and Represent
Multi-Subspace Data via Minimum Lossy Coding Length based on Rate-Distortion
Theory [0.0]
Clustering, classify and represent are three fundamental objectives of learning from high-dimensional data with intrinsic structure.
This paper introduces three interpretable approaches, i.e., segmentation (clustering) via the Minimum Lossy Coding Length criterion, classification via the Minimum Incremental Coding Length criterion and representation via the Maximal Coding Rate Reduction criterion.
arXiv Detail & Related papers (2023-02-21T01:15:08Z) - Improved Representation Learning Through Tensorized Autoencoders [7.056005298953332]
Autoencoders (AE) are widely used in practice for unsupervised representation learning.
We propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE)
We prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE.
arXiv Detail & Related papers (2022-12-02T09:29:48Z) - Federated Representation Learning via Maximal Coding Rate Reduction [109.26332878050374]
We propose a methodology to learn low-dimensional representations from a dataset that is distributed among several clients.
Our proposed method, which we refer to as FLOW, utilizes MCR2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible.
arXiv Detail & Related papers (2022-10-01T15:43:51Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - TCGM: An Information-Theoretic Framework for Semi-Supervised
Multi-Modality Learning [35.76792527025377]
We propose a novel information-theoretic approach, namely textbfTotal textbfCorrelation textbfGain textbfMaximization (TCGM) for semi-supervised multi-modal learning.
We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.
arXiv Detail & Related papers (2020-07-14T03:32:03Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Learning Discrete Structured Representations by Adversarially Maximizing
Mutual Information [39.87273353895564]
We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable.
Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy calculation.
We apply our model on document hashing and show that it outperforms current best baselines based on discrete and vector quantized variational autoencoders.
arXiv Detail & Related papers (2020-04-08T13:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.