Learning Architectures from an Extended Search Space for Language
Modeling
- URL: http://arxiv.org/abs/2005.02593v2
- Date: Fri, 5 Jun 2020 06:23:49 GMT
- Title: Learning Architectures from an Extended Search Space for Language
Modeling
- Authors: Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao,
Jingbo Zhu, Tongran Liu, Changliang Li
- Abstract summary: We present a general approach to learn both intra-cell and inter-cell architectures of Neural architecture search (NAS)
For recurrent neural language modeling, it outperforms a strong baseline significantly on the PTB and WikiText data, with a new state-of-the-art on PTB.
The learned architectures show good transferability to other systems.
- Score: 37.79977691127229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural architecture search (NAS) has advanced significantly in recent years
but most NAS systems restrict search to learning architectures of a recurrent
or convolutional cell. In this paper, we extend the search space of NAS. In
particular, we present a general approach to learn both intra-cell and
inter-cell architectures (call it ESS). For a better search result, we design a
joint learning method to perform intra-cell and inter-cell NAS simultaneously.
We implement our model in a differentiable architecture search system. For
recurrent neural language modeling, it outperforms a strong baseline
significantly on the PTB and WikiText data, with a new state-of-the-art on PTB.
Moreover, the learned architectures show good transferability to other systems.
E.g., they improve state-of-the-art systems on the CoNLL and WNUT named entity
recognition (NER) tasks and CoNLL chunking task, indicating a promising line of
research on large-scale pre-learned architectures.
Related papers
- EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [54.99121380536659]
Eye movement biometrics have received increasing attention thanks to its high secure identification.
Deep learning (DL) models have been recently successfully applied for eye movement recognition.
DL architecture still is determined by human prior knowledge.
We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - einspace: Searching for Neural Architectures from Fundamental Operations [28.346238250052455]
We introduce einspace, a search space based on a parameterised probabilistic context-free grammar.
We show that competitive architectures can be obtained by searching from scratch, and we consistently find large improvements when initialising the search with strong baselines.
arXiv Detail & Related papers (2024-05-31T14:25:45Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - NASiam: Efficient Representation Learning using Neural Architecture
Search for Siamese Networks [76.8112416450677]
Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL)
NASiam is a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair)
NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours.
arXiv Detail & Related papers (2023-01-31T19:48:37Z) - Towards Less Constrained Macro-Neural Architecture Search [2.685668802278155]
Neural Architecture Search (NAS) networks achieve state-of-the-art performance in a variety of tasks.
Most NAS methods rely heavily on human-defined assumptions that constrain the search.
We present experiments showing that LCMNAS generates state-of-the-art architectures from scratch with minimal GPU computation.
arXiv Detail & Related papers (2022-03-10T17:53:03Z) - Neural Architecture Search for Dense Prediction Tasks in Computer Vision [74.9839082859151]
Deep learning has led to a rising demand for neural network architecture engineering.
neural architecture search (NAS) aims at automatically designing neural network architectures in a data-driven manner rather than manually.
NAS has become applicable to a much wider range of problems in computer vision.
arXiv Detail & Related papers (2022-02-15T08:06:50Z) - Pretraining Neural Architecture Search Controllers with Locality-based
Self-Supervised Learning [0.0]
We propose a pretraining scheme that can be applied to controller-based NAS.
Our method, locality-based self-supervised classification task, leverages the structural similarity of network architectures to obtain good architecture representations.
arXiv Detail & Related papers (2021-03-15T06:30:36Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - ModuleNet: Knowledge-inherited Neural Architecture Search [7.769061374951596]
We discuss what kind of knowledge in a model can and should be used for new architecture design.
We propose a new NAS algorithm, namely ModuleNet, which can fully inherit knowledge from existing convolutional neural networks.
Our strategy can efficiently evaluate the performance of new architecture even without tuning weights in convolutional layers.
arXiv Detail & Related papers (2020-04-10T13:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.