Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers
- URL: http://arxiv.org/abs/2504.02494v1
- Date: Thu, 03 Apr 2025 11:18:00 GMT
- Title: Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers
- Authors: Faisal Mohammad, Duksan Ryu,
- Abstract summary: We propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification.<n>ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures.<n>It achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semiconductor wafer defect classification is critical for ensuring high precision and yield in manufacturing. Traditional CNN-based models often struggle with class imbalances and recognition of the multiple overlapping defect types in wafer maps. To address these challenges, we propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification. Trained on the WM-38k dataset. ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures. Through extensive ablation studies, we determine that a patch size of 16 provides optimal performance. ViT-Tiny achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification. Additionally, it demonstrates enhanced robustness under limited labeled data conditions, making it a computationally efficient and reliable solution for real-world semiconductor defect detection.
Related papers
- Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT [0.0]
This work advances real-time, energy-efficient crop monitoring in precision agriculture.
It demonstrates how we can attain ViT-level diagnostic precision on edge devices.
arXiv Detail & Related papers (2025-04-21T06:56:41Z) - Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection [4.272401529389713]
We propose Fab-ME, an advanced framework based on YOLOv8s for accurate detection of 20 fabric defect types.<n>Our contributions include the introduction of the cross-stage partial bottleneck with two convolutions (C2F) vision state-space (C2F-VMamba) module.<n> Experimental results on the Tianchi fabric defect detection dataset demonstrate that Fab-ME achieves a 3.5% improvement in mAP@0.5 compared to the original YOLOv8s.
arXiv Detail & Related papers (2024-12-04T10:40:17Z) - Utilizing Generative Adversarial Networks for Image Data Augmentation and Classification of Semiconductor Wafer Dicing Induced Defects [0.21990652930491852]
In semiconductor manufacturing, the wafer dicing process is central yet vulnerable to defects that significantly impair yield.
Deep neural networks are the current state of the art in (semi-)automated visual inspection.
We explore the application of generative adversarial networks (GAN) for image data augmentation and classification of semiconductor wafer dicing induced defects.
arXiv Detail & Related papers (2024-07-24T20:44:16Z) - Sub-token ViT Embedding via Stochastic Resonance Transformers [51.12001699637727]
Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch.
We propose a training-free method inspired by "stochastic resonance"
The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization.
arXiv Detail & Related papers (2023-10-06T01:53:27Z) - PotatoPestNet: A CTInceptionV3-RS-Based Neural Network for Accurate
Identification of Potato Pests [0.0]
We propose an efficient PotatoPestNet AI-based automatic potato pest identification system.
We leveraged the power of transfer learning by employing five customized, pre-trained transfer learning models.
Among the models, the Customized Tuned Inception V3 model, optimized through random search, demonstrated outstanding performance.
arXiv Detail & Related papers (2023-05-27T17:38:16Z) - Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design [84.34416126115732]
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration.
We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers.
Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute.
arXiv Detail & Related papers (2023-05-22T13:39:28Z) - LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision
Transformers [18.76039338977432]
Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks.
We introduce a novel Label-aware Contrastive Training framework, LaCViT, which significantly enhances the quality of embeddings in ViTs.
LaCViT statistically significantly enhances the performance of three evaluated ViTs by up-to 10.78% under Top-1 Accuracy.
arXiv Detail & Related papers (2023-03-31T12:38:08Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Semi-supervised Vision Transformers at Scale [93.0621675558895]
We study semi-supervised learning (SSL) for vision transformers (ViT)
We propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning.
Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting.
arXiv Detail & Related papers (2022-08-11T08:11:54Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.