Efficient Neural Architecture Search for End-to-end Speech Recognition
via Straight-Through Gradients
- URL: http://arxiv.org/abs/2011.05649v1
- Date: Wed, 11 Nov 2020 09:18:58 GMT
- Title: Efficient Neural Architecture Search for End-to-end Speech Recognition
via Straight-Through Gradients
- Authors: Huahuan Zheng, Keyu An, Zhijian Ou
- Abstract summary: We develop an efficient Neural Architecture Search (NAS) method via Straight-Through (ST) gradients, called ST-NAS.
Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets.
Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.
- Score: 17.501966450686282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Architecture Search (NAS), the process of automating architecture
engineering, is an appealing next step to advancing end-to-end Automatic Speech
Recognition (ASR), replacing expert-designed networks with learned,
task-specific architectures. In contrast to early computational-demanding NAS
methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable
ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly
improve the NAS efficiency. In this paper, we make two contributions. First, we
rigorously develop an efficient NAS method via Straight-Through (ST) gradients,
called ST-NAS. Basically, ST-NAS uses the loss from SNAS but uses ST to
back-propagate gradients through discrete variables to optimize the loss, which
is not revealed in ProxylessNAS. Using ST gradients to support sub-graph
sampling is a core element to achieve efficient NAS beyond DARTS and SNAS.
Second, we successfully apply ST-NAS to end-to-end ASR. Experiments over the
widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that the
ST-NAS induced architectures significantly outperform the human-designed
architecture across the two datasets. Strengths of ST-NAS such as architecture
transferability and low computation cost in memory and time are also reported.
Related papers
- SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z) - Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets [55.2118691522524]
Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture.
We propose a distillation-aware meta accuracy prediction model, DaSS (Distillation-aware Student Search), which can predict a given architecture's final performances on a dataset.
arXiv Detail & Related papers (2023-05-26T14:00:35Z) - DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG.
Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them.
We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS.
When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z) - BaLeNAS: Differentiable Architecture Search via the Bayesian Learning
Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost.
This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions.
We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z) - TND-NAS: Towards Non-differentiable Objectives in Progressive
Differentiable NAS Framework [6.895590095853327]
Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS)
Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption.
We propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS.
arXiv Detail & Related papers (2021-11-06T14:19:36Z) - Memory-Efficient Hierarchical Neural Architecture Search for Image
Restoration [68.6505473346005]
We propose a memory-efficient hierarchical NAS HiNAS (HiNAS) for image denoising and image super-resolution tasks.
With a single GTX1080Ti GPU, it takes only about 1 hour for searching for denoising network on BSD 500 and 3.5 hours for searching for the super-resolution structure on DIV2K.
arXiv Detail & Related papers (2020-12-24T12:06:17Z) - AdvantageNAS: Efficient Neural Architecture Search with Credit
Assignment [23.988393741948485]
We propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS.
AdvantageNAS is a gradient-based approach that improves the search efficiency by introducing credit assignment in gradient estimation for architecture updates.
Experiments on the NAS-Bench-201 and PTB dataset show that AdvantageNAS discovers an architecture with higher performance under a limited time budget.
arXiv Detail & Related papers (2020-12-11T05:45:03Z) - Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z) - DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation.
We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate.
DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.