AdaXpert: Adapting Neural Architecture for Growing Data
- URL: http://arxiv.org/abs/2107.00254v1
- Date: Thu, 1 Jul 2021 07:22:05 GMT
- Title: AdaXpert: Adapting Neural Architecture for Growing Data
- Authors: Shuaicheng Niu, Jiaxiang Wu, Guanghui Xu, Yifan Zhang, Yong Guo,
Peilin Zhao, Peng Wang, Mingkui Tan
- Abstract summary: In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically.
Given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance.
Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset.
- Score: 63.30393509048505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real-world applications, data often come in a growing manner, where the
data volume and the number of classes may increase dynamically. This will bring
a critical challenge for learning: given the increasing data volume or the
number of classes, one has to instantaneously adjust the neural model capacity
to obtain promising performance. Existing methods either ignore the growing
nature of data or seek to independently search an optimal architecture for a
given dataset, and thus are incapable of promptly adjusting the architectures
for the changed data. To address this, we present a neural architecture
adaptation method, namely Adaptation eXpert (AdaXpert), to efficiently adjust
previous architectures on the growing data. Specifically, we introduce an
architecture adjuster to generate a suitable architecture for each data
snapshot, based on the previous architecture and the different extent between
current and previous data distributions. Furthermore, we propose an adaptation
condition to determine the necessity of adjustment, thereby avoiding
unnecessary and time-consuming adjustments. Extensive experiments on two growth
scenarios (increasing data volume and number of classes) demonstrate the
effectiveness of the proposed method.
Related papers
- Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws [59.03420759554073]
We introduce Adaptive Data Optimization (ADO), an algorithm that optimize data distributions in an online fashion, concurrent with model training.
ADO does not require external knowledge, proxy models, or modifications to the model update.
ADO uses per-domain scaling laws to estimate the learning potential of each domain during training and adjusts the data mixture accordingly.
arXiv Detail & Related papers (2024-10-15T17:47:44Z) - Exploring the design space of deep-learning-based weather forecasting systems [56.129148006412855]
This paper systematically analyzes the impact of different design choices on deep-learning-based weather forecasting systems.
We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models.
We propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures.
arXiv Detail & Related papers (2024-10-09T22:25:50Z) - Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator.
We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z) - Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling [4.190836962132713]
This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms.
At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its conditioned kernel on input sequence.
We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality.
arXiv Detail & Related papers (2024-02-28T17:36:45Z) - MSTAR: Multi-Scale Backbone Architecture Search for Timeseries
Classification [0.41185655356953593]
We propose a novel multi-scale search space and a framework for Neural architecture search (NAS)
We show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights.
Our search space reaches the state-of-the-art performance on four datasets on four different domains.
arXiv Detail & Related papers (2024-02-21T13:59:55Z) - Temporal Convolution Domain Adaptation Learning for Crops Growth
Prediction [5.966652553573454]
We construct an innovative network architecture based on domain adaptation learning to predict crops growth curves with limited available crop data.
We are the first to use the temporal convolution filters as the backbone to construct a domain adaptation network architecture.
Results show that the proposed temporal convolution-based network architecture outperforms all benchmarks not only in accuracy but also in model size and convergence rate.
arXiv Detail & Related papers (2022-02-24T14:22:36Z) - Data Scaling Laws in NMT: The Effect of Noise and Architecture [59.767899982937756]
We study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT)
We find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data.
arXiv Detail & Related papers (2022-02-04T06:53:49Z) - AutoAdapt: Automated Segmentation Network Search for Unsupervised Domain
Adaptation [4.793219747021116]
We perform neural architecture search (NAS) to provide architecture-level perspective and analysis for domain adaptation.
We propose bridging this gap by using maximum mean discrepancy and regional weighted entropy to estimate the accuracy metric.
arXiv Detail & Related papers (2021-06-24T17:59:02Z) - Rethinking Architecture Design for Tackling Data Heterogeneity in
Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts.
Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.