Improving Neural ODEs via Knowledge Distillation
- URL: http://arxiv.org/abs/2203.05103v1
- Date: Thu, 10 Mar 2022 01:23:41 GMT
- Title: Improving Neural ODEs via Knowledge Distillation
- Authors: Haoyu Chu, Shikui Wei, Qiming Lu, Yao Zhao
- Abstract summary: We propose a new training based on knowledge distillation to construct more powerful and robust Neural ODEs fitting image recognition tasks.
The experimental results show that the new training manner can improve the classification accuracy of Neural ODEs by 24% on CIFAR10 and 5% on SVHN.
- Score: 35.92851907503015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Ordinary Differential Equations (Neural ODEs) construct the continuous
dynamics of hidden units using ordinary differential equations specified by a
neural network, demonstrating promising results on many tasks. However, Neural
ODEs still do not perform well on image recognition tasks. The possible reason
is that the one-hot encoding vector commonly used in Neural ODEs can not
provide enough supervised information. We propose a new training based on
knowledge distillation to construct more powerful and robust Neural ODEs
fitting image recognition tasks. Specially, we model the training of Neural
ODEs into a teacher-student learning process, in which we propose ResNets as
the teacher model to provide richer supervised information. The experimental
results show that the new training manner can improve the classification
accuracy of Neural ODEs by 24% on CIFAR10 and 5% on SVHN. In addition, we also
quantitatively discuss the effect of both knowledge distillation and time
horizon in Neural ODEs on robustness against adversarial examples. The
experimental analysis concludes that introducing the knowledge distillation and
increasing the time horizon can improve the robustness of Neural ODEs against
adversarial examples.
Related papers
- Adaptive Feedforward Gradient Estimation in Neural ODEs [0.0]
We propose a novel approach that leverages adaptive feedforward gradient estimation to improve the efficiency, consistency, and interpretability of Neural ODEs.
Our method eliminates the need for backpropagation and the adjoint method, reducing computational overhead and memory usage while maintaining accuracy.
arXiv Detail & Related papers (2024-09-22T18:21:01Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Standalone Neural ODEs with Sensitivity Analysis [5.565364597145569]
This paper presents a continuous-depth neural ODE model capable of describing a full deep neural network.
We present a general formulation of the neural sensitivity problem and show how it is used in the NCG training.
Our evaluations demonstrate that our novel formulations lead to increased robustness and performance as compared to ResNet models.
arXiv Detail & Related papers (2022-05-27T12:16:53Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Self-Denoising Neural Networks for Few Shot Learning [66.38505903102373]
We present a new training scheme that adds noise at multiple stages of an existing neural architecture while simultaneously learning to be robust to this added noise.
This architecture, which we call a Self-Denoising Neural Network (SDNN), can be applied easily to most modern convolutional neural architectures.
arXiv Detail & Related papers (2021-10-26T03:28:36Z) - Heavy Ball Neural Ordinary Differential Equations [12.861233366398162]
We propose heavy ball neural ordinary differential equations (HBNODEs) to improve neural ODEs (NODEs) training and inference.
HBNODEs have two properties that imply practical advantages over NODEs.
We verify the advantages of HBNODEs over NODEs on benchmark tasks, including image classification, learning complex dynamics, and sequential modeling.
arXiv Detail & Related papers (2021-10-10T16:11:11Z) - Artificial Neural Variability for Deep Learning: On Overfitting, Noise
Memorization, and Catastrophic Forgetting [135.0863818867184]
artificial neural variability (ANV) helps artificial neural networks learn some advantages from natural'' neural networks.
ANV plays as an implicit regularizer of the mutual information between the training data and the learned model.
It can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs.
arXiv Detail & Related papers (2020-11-12T06:06:33Z) - Neural Ordinary Differential Equation based Recurrent Neural Network
Model [0.7233897166339269]
differential equations are a promising new member in the neural network family.
This paper explores the strength of the ordinary differential equation (ODE) is explored with a new extension.
Two new ODE-based RNN models (GRU-ODE model and LSTM-ODE) can compute the hidden state and cell state at any point of time using an ODE solver.
Experiments show that these new ODE based RNN models require less training time than Latent ODEs and conventional Neural ODEs.
arXiv Detail & Related papers (2020-05-20T01:02:29Z) - Time Dependence in Non-Autonomous Neural ODEs [74.78386661760662]
We propose a novel family of Neural ODEs with time-varying weights.
We outperform previous Neural ODE variants in both speed and representational capacity.
arXiv Detail & Related papers (2020-05-05T01:41:46Z) - Stochasticity in Neural ODEs: An Empirical Study [68.8204255655161]
Regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization.
We show that data augmentation during the training improves the performance of both deterministic and versions of the same model.
However, the improvements obtained by the data augmentation completely eliminate the empirical regularization gains, making the performance of neural ODE and neural SDE negligible.
arXiv Detail & Related papers (2020-02-22T22:12:56Z) - Transfer Learning using Neural Ordinary Differential Equations [0.32228025627337864]
We use EfficientNets to explore transfer learning on CIFAR-10 dataset.
Using NODE for fine tuning provides more stability during training and validation.
arXiv Detail & Related papers (2020-01-21T04:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.