Revisiting Architecture-aware Knowledge Distillation: Smaller Models and
Faster Search
- URL: http://arxiv.org/abs/2206.13130v1
- Date: Mon, 27 Jun 2022 09:22:23 GMT
- Title: Revisiting Architecture-aware Knowledge Distillation: Smaller Models and
Faster Search
- Authors: Taehyeon Kim, Heesoo Myeong, Se-Young Yun
- Abstract summary: This paper introduces a new algorithm, coined as Trust Region Aware architecture search to Distill knowledge Effectively (TRADE)
Experimental results show our proposed TRADE algorithm consistently outperforms both the conventional NAS approach and pre-defined architecture under KD training.
- Score: 14.487264853431878
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Knowledge Distillation (KD) has recently emerged as a popular method for
compressing neural networks. In recent studies, generalized distillation
methods that find parameters and architectures of student models at the same
time have been proposed. Still, this search method requires a lot of
computation to search for architectures and has the disadvantage of considering
only convolutional blocks in their search space. This paper introduces a new
algorithm, coined as Trust Region Aware architecture search to Distill
knowledge Effectively (TRADE), that rapidly finds effective student
architectures from several state-of-the-art architectures using trust region
Bayesian optimization approach. Experimental results show our proposed TRADE
algorithm consistently outperforms both the conventional NAS approach and
pre-defined architecture under KD training.
Related papers
- Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models.
We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z) - An Approach for Combining Multimodal Fusion and Neural Architecture
Search Applied to Knowledge Tracing [6.540879944736641]
We propose a sequential model based optimization approach that combines multimodal fusion and neural architecture search within one framework.
We evaluate our methods on two public real datasets showing the discovered model is able to achieve superior performance.
arXiv Detail & Related papers (2021-11-08T13:43:46Z) - Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS)
It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search.
It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z) - RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform
Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget.
We formulate predictor-based architecture search as learning to rank with pairwise comparisons.
The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Neighborhood-Aware Neural Architecture Search [43.87465987957761]
We propose a novel neural architecture search (NAS) method to identify flat-minima architectures in the search space.
Our formulation takes the "flatness" of an architecture into account by aggregating the performance over the neighborhood of this architecture.
Based on our formulation, we propose neighborhood-aware random search (NA-RS) and neighborhood-aware differentiable architecture search (NA-DARTS)
arXiv Detail & Related papers (2021-05-13T15:56:52Z) - Off-Policy Reinforcement Learning for Efficient and Effective GAN
Architecture Search [50.40004966087121]
We introduce a new reinforcement learning based neural architecture search (NAS) methodology for generative adversarial network (GAN) architecture search.
The key idea is to formulate the GAN architecture search problem as a Markov decision process (MDP) for smoother architecture sampling.
We exploit an off-policy GAN architecture search algorithm that makes efficient use of the samples generated by previous policies.
arXiv Detail & Related papers (2020-07-17T18:29:17Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Does Unsupervised Architecture Representation Learning Help Neural
Architecture Search? [22.63641173256389]
Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias.
We observe that the structural properties of neural architectures are hard to preserve in the latent space if architecture representation learning and search are coupled, resulting in less effective search performance.
arXiv Detail & Related papers (2020-06-12T04:15:34Z) - RC-DARTS: Resource Constrained Differentiable Architecture Search [162.7199952019152]
We propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster.
We show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity.
arXiv Detail & Related papers (2019-12-30T05:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.