Efficient Maximal Coding Rate Reduction by Variational Forms
- URL: http://arxiv.org/abs/2204.00077v1
- Date: Thu, 31 Mar 2022 20:39:53 GMT
- Title: Efficient Maximal Coding Rate Reduction by Variational Forms
- Authors: Christina Baek, Ziyang Wu, Kwan Ho Ryan Chan, Tianjiao Ding, Yi Ma,
Benjamin D. Haeffele
- Abstract summary: We reformulate the principle of Maximal Coding Rate Reduction to a form that can scale significantly without compromising training accuracy.
Experiments in image classification demonstrate that our proposed formulation results in a significant speed up over optimizing the original MCR$2$ objective.
- Score: 25.1177997721719
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The principle of Maximal Coding Rate Reduction (MCR$^2$) has recently been
proposed as a training objective for learning discriminative low-dimensional
structures intrinsic to high-dimensional data to allow for more robust training
than standard approaches, such as cross-entropy minimization. However, despite
the advantages that have been shown for MCR$^2$ training, MCR$^2$ suffers from
a significant computational cost due to the need to evaluate and differentiate
a significant number of log-determinant terms that grows linearly with the
number of classes. By taking advantage of variational forms of spectral
functions of a matrix, we reformulate the MCR$^2$ objective to a form that can
scale significantly without compromising training accuracy. Experiments in
image classification demonstrate that our proposed formulation results in a
significant speed up over optimizing the original MCR$^2$ objective directly
and often results in higher quality learned representations. Further, our
approach may be of independent interest in other models that require
computation of log-determinant forms, such as in system identification or
normalizing flow models.
Related papers
- Transfer Learning on Multi-Dimensional Data: A Novel Approach to Neural Network-Based Surrogate Modeling [0.0]
Convolutional neural networks (CNNs) have gained popularity as the basis for such surrogate models.
We propose training a CNN surrogate model on a mixture of numerical solutions to both the $d$-dimensional problem and its ($d-1$)-dimensional approximation.
We demonstrate our approach on a multiphase flow test problem, using transfer learning to train a dense fully-convolutional encoder-decoder CNN on the two classes of data.
arXiv Detail & Related papers (2024-10-16T05:07:48Z) - Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis [16.288866201806382]
We develop a model-free RLHF best policy identification algorithm, called $mathsfBSAD$, without explicit reward model inference.
The algorithm identifies the optimal policy directly from human preference information in a backward manner.
arXiv Detail & Related papers (2024-06-11T17:01:41Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - A Minimax Probability Machine for Non-Decomposable Performance Measures [15.288802707471792]
Imbalanced classification tasks are widespread in many real-world applications.
The minimax probability machine is a popular method for binary classification problems.
This paper develops a new minimax probability machine for the $F_beta$ measure, called MPMF, which can be used to deal with imbalanced classification tasks.
arXiv Detail & Related papers (2021-02-28T04:58:46Z) - Revisiting minimum description length complexity in overparameterized
models [38.21167656112762]
We provide an extensive theoretical characterization of MDL-COMP for linear models and kernel methods.
For kernel methods, we show that MDL-COMP informs minimax in-sample error, and can decrease as the dimensionality of the input increases.
We also prove that MDL-COMP bounds the in-sample mean squared error (MSE)
arXiv Detail & Related papers (2020-06-17T22:45:14Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.