DASViT: Differentiable Architecture Search for Vision Transformer
- URL: http://arxiv.org/abs/2507.13079v1
- Date: Thu, 17 Jul 2025 12:48:00 GMT
- Title: DASViT: Differentiable Architecture Search for Vision Transformer
- Authors: Pengjin Wu, Ferrante Neri, Zhenhua Feng,
- Abstract summary: We introduce Differentiable Architecture Search for Vision Transformer (DASViT)<n>DASViT bridges the gap in differentiable search for ViTs and uncovers novel designs.<n> Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
- Score: 8.839801565444775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and uncovers novel designs. Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
Related papers
- DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation [33.08911251924756]
We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem.<n>DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization.<n>Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs.
arXiv Detail & Related papers (2025-07-07T05:22:55Z) - Learning Novel Transformer Architecture for Time-series Forecasting [9.412920379798928]
AutoFormer-TS is a novel framework that leverages a comprehensive search space for Transformer architectures tailored to time-series prediction tasks.<n>Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches.
arXiv Detail & Related papers (2025-02-19T13:49:20Z) - TART: Token-based Architecture Transformer for Neural Network Performance Prediction [0.0]
Token-based Architecture Transformer (TART) predicts neural network performance without the need to train candidate networks.<n>TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information.
arXiv Detail & Related papers (2025-01-02T05:22:17Z) - EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [20.209756662832365]
Differentiable Neural Architecture Search (DARTS) automates the manual process of architecture design with high search efficiency.<n>We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.<n>We show that EM-DARTS is capable of producing an optimal architecture that leads to state-of-the-art recognition performance.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection.
We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space.
We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z) - Searching for Efficient Multi-Stage Vision Transformers [42.0565109812926]
Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks.
ViT-ResNAS is an efficient multi-stage ViT architecture designed with neural architecture search (NAS)
arXiv Detail & Related papers (2021-09-01T22:37:56Z) - Multi-Exit Vision Transformer for Dynamic Inference [88.17413955380262]
We propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones.
We show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.
arXiv Detail & Related papers (2021-06-29T09:01:13Z) - Vision Transformer Architecture Search [64.73920718915282]
Current vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks.
We propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets.
Our searched architecture achieves $74.7%$ top-$1$ accuracy on ImageNet and is $2.5%$ superior than the current baseline ViT architecture.
arXiv Detail & Related papers (2021-06-25T15:39:08Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - NAS-Count: Counting-by-Density with Neural Architecture Search [74.92941571724525]
We automate the design of counting models with Neural Architecture Search (NAS)
We introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet)
arXiv Detail & Related papers (2020-02-29T09:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.