Understanding Dynamics of Nonlinear Representation Learning and Its
Application
- URL: http://arxiv.org/abs/2106.14836v1
- Date: Mon, 28 Jun 2021 16:31:30 GMT
- Title: Understanding Dynamics of Nonlinear Representation Learning and Its
Application
- Authors: Kenji Kawaguchi, Linjun Zhang, Zhun Deng
- Abstract summary: We study the dynamics of implicit nonlinear representation learning.
We show that the data-architecture alignment condition is sufficient for the global convergence.
We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
- Score: 12.697842097171119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representations of the world environment play a crucial role in machine
intelligence. It is often inefficient to conduct reasoning and inference
directly in the space of raw sensory representations, such as pixel values of
images. Representation learning allows us to automatically discover suitable
representations from raw sensory data. For example, given raw sensory data, a
multilayer perceptron learns nonlinear representations at its hidden layers,
which are subsequently used for classification (or regression) at its output
layer. This happens implicitly during training through minimizing a supervised
or unsupervised loss. In this paper, we study the dynamics of such implicit
nonlinear representation learning. We identify a pair of a new assumption and a
novel condition, called the common model structure assumption and the
data-architecture alignment condition. Under the common model structure
assumption, the data-architecture alignment condition is shown to be sufficient
for the global convergence and necessary for the global optimality. Our results
provide practical guidance for designing a model structure: e.g., the common
model structure assumption can be used as a justification for using a
particular model structure instead of others. As an application, we then derive
a new training framework, which satisfies the data-architecture alignment
condition without assuming it by automatically modifying any given training
algorithm dependently on each data and architecture. Given a standard training
algorithm, the framework running its modified version is empirically shown to
maintain competitive (practical) test performances while providing global
convergence guarantees for ResNet-18 with convolutions, skip connections, and
batch normalization with standard benchmark datasets, including MNIST,
CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.
Related papers
- Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Learning Curves for SGD on Structured Features [23.40229188549055]
We show that the geometry of the data in the induced feature space is crucial to accurately predict the test error throughout learning.
We show that modeling the geometry of the data in the induced feature space is indeed crucial to accurately predict the test error throughout learning.
arXiv Detail & Related papers (2021-06-04T20:48:20Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - On the Transfer of Disentangled Representations in Realistic Settings [44.367245337475445]
We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images.
We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings.
arXiv Detail & Related papers (2020-10-27T16:15:24Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.