Related papers: Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers

Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers

URL: http://arxiv.org/abs/2504.02494v1
Date: Thu, 03 Apr 2025 11:18:00 GMT
Title: Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers
Authors: Faisal Mohammad, Duksan Ryu,
Abstract summary: We propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification.<n>ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures.<n>It achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semiconductor wafer defect classification is critical for ensuring high precision and yield in manufacturing. Traditional CNN-based models often struggle with class imbalances and recognition of the multiple overlapping defect types in wafer maps. To address these challenges, we propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification. Trained on the WM-38k dataset. ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures. Through extensive ablation studies, we determine that a patch size of 16 provides optimal performance. ViT-Tiny achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification. Additionally, it demonstrates enhanced robustness under limited labeled data conditions, making it a computationally efficient and reliable solution for real-world semiconductor defect detection.

Related papers

Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT [0.0]
This work advances real-time, energy-efficient crop monitoring in precision agriculture. It demonstrates how we can attain ViT-level diagnostic precision on edge devices.
arXiv Detail & Related papers (2025-04-21T06:56:41Z)
Fab-ME: A Vision State-Space and Attention-Enhanced Framework for Fabric Defect Detection [4.272401529389713]
We propose Fab-ME, an advanced framework based on YOLOv8s for accurate detection of 20 fabric defect types.<n>Our contributions include the introduction of the cross-stage partial bottleneck with two convolutions (C2F) vision state-space (C2F-VMamba) module.<n> Experimental results on the Tianchi fabric defect detection dataset demonstrate that Fab-ME achieves a 3.5% improvement in mAP@0.5 compared to the original YOLOv8s.
arXiv Detail & Related papers (2024-12-04T10:40:17Z)
Utilizing Generative Adversarial Networks for Image Data Augmentation and Classification of Semiconductor Wafer Dicing Induced Defects [0.21990652930491852]
In semiconductor manufacturing, the wafer dicing process is central yet vulnerable to defects that significantly impair yield. Deep neural networks are the current state of the art in (semi-)automated visual inspection. We explore the application of generative adversarial networks (GAN) for image data augmentation and classification of semiconductor wafer dicing induced defects.
arXiv Detail & Related papers (2024-07-24T20:44:16Z)
Sub-token ViT Embedding via Stochastic Resonance Transformers [51.12001699637727]
Vision Transformer (ViT) architectures represent images as collections of high-dimensional vectorized tokens, each corresponding to a rectangular non-overlapping patch. We propose a training-free method inspired by "stochastic resonance" The resulting "Stochastic Resonance Transformer" (SRT) retains the rich semantic information of the original representation, but grounds it on a finer-scale spatial domain, partly mitigating the coarse effect of spatial tokenization.
arXiv Detail & Related papers (2023-10-06T01:53:27Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
PotatoPestNet: A CTInceptionV3-RS-Based Neural Network for Accurate Identification of Potato Pests [0.0]
We propose an efficient PotatoPestNet AI-based automatic potato pest identification system. We leveraged the power of transfer learning by employing five customized, pre-trained transfer learning models. Among the models, the Customized Tuned Inception V3 model, optimized through random search, demonstrated outstanding performance.
arXiv Detail & Related papers (2023-05-27T17:38:16Z)
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design [84.34416126115732]
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute.
arXiv Detail & Related papers (2023-05-22T13:39:28Z)
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers [18.76039338977432]
Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks. We introduce a novel Label-aware Contrastive Training framework, LaCViT, which significantly enhances the quality of embeddings in ViTs. LaCViT statistically significantly enhances the performance of three evaluated ViTs by up-to 10.78% under Top-1 Accuracy.
arXiv Detail & Related papers (2023-03-31T12:38:08Z)
Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT) Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z)
Semi-supervised Vision Transformers at Scale [93.0621675558895]
We study semi-supervised learning (SSL) for vision transformers (ViT) We propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting.
arXiv Detail & Related papers (2022-08-11T08:11:54Z)
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.