Related papers: Investigating Shift Equivalence of Convolutional Neural Networks in Industrial Defect Segmentation

Investigating Shift Equivalence of Convolutional Neural Networks in Industrial Defect Segmentation

URL: http://arxiv.org/abs/2309.16902v1
Date: Fri, 29 Sep 2023 00:04:47 GMT
Title: Investigating Shift Equivalence of Convolutional Neural Networks in Industrial Defect Segmentation
Authors: Zhen Qu, Xian Tao, Fei Shen, Zhengtao Zhang, Tao Li
Abstract summary: In industrial defect segmentation tasks, output consistency (also referred to equivalence) of the model is often overlooked. A novel pair of down/upsampling layers called component attention polyphase sampling (CAPS) is proposed as a replacement for the conventional sampling layers in CNNs. The experimental results on the micro surface defect (MSD) dataset and four real-world industrial defect datasets demonstrate that the proposed method exhibits higher equivalence and segmentation performance.
Score: 3.843350895842836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In industrial defect segmentation tasks, while pixel accuracy and Intersection over Union (IoU) are commonly employed metrics to assess segmentation performance, the output consistency (also referred to equivalence) of the model is often overlooked. Even a small shift in the input image can yield significant fluctuations in the segmentation results. Existing methodologies primarily focus on data augmentation or anti-aliasing to enhance the network's robustness against translational transformations, but their shift equivalence performs poorly on the test set or is susceptible to nonlinear activation functions. Additionally, the variations in boundaries resulting from the translation of input images are consistently disregarded, thus imposing further limitations on the shift equivalence. In response to this particular challenge, a novel pair of down/upsampling layers called component attention polyphase sampling (CAPS) is proposed as a replacement for the conventional sampling layers in CNNs. To mitigate the effect of image boundary variations on the equivalence, an adaptive windowing module is designed in CAPS to adaptively filter out the border pixels of the image. Furthermore, a component attention module is proposed to fuse all downsampled features to improve the segmentation performance. The experimental results on the micro surface defect (MSD) dataset and four real-world industrial defect datasets demonstrate that the proposed method exhibits higher equivalence and segmentation performance compared to other state-of-the-art methods.Our code will be available at https://github.com/xiaozhen228/CAPS.

Related papers

Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions. We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder. Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks. We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view. Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z)
Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes. This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged. A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z)
PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders. We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z)
Change-Aware Siamese Network for Surface Defects Segmentation under Complex Background [0.6407952035735353]
We propose a change-aware Siamese network that solves the defect segmentation in a change detection framework. A novel multi-class balanced contrastive loss is introduced to guide the Transformer-based encoder. The difference presented by a distance map is then skip-connected to the change-aware decoder to assist in the location of both inter-class and out-of-class pixel-wise defects.
arXiv Detail & Related papers (2024-09-01T02:48:11Z)
Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling [14.731788603429774]
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) We propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity.
arXiv Detail & Related papers (2024-04-11T00:49:38Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization. This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z)
Adapting the Hypersphere Loss Function from Anomaly Detection to Anomaly Segmentation [1.9458156037869137]
We propose an incremental improvement to Fully Convolutional Data Description (FCDD) FCDD is an adaptation of the one-class classification approach from anomaly detection to image anomaly segmentation (a.k.a. anomaly localization) We analyze its original loss function and propose a substitute that better resembles its predecessor, the Hypersphere (HSC)
arXiv Detail & Related papers (2023-01-23T18:06:35Z)
Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy. Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification. We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z)
Dispensed Transformer Network for Unsupervised Domain Adaptation [21.256375606219073]
A novel unsupervised domain adaptation (UDA) method named dispensed Transformer network (DTNet) is introduced in this paper. Our proposed network achieves the best performance in comparison with several state-of-the-art techniques.
arXiv Detail & Related papers (2021-10-28T08:27:44Z)
TFill: Image Completion via a Transformer-Based Architecture [69.62228639870114]
We propose treating image completion as a directionless sequence-to-sequence prediction task. We employ a restrictive CNN with small and non-overlapping RF for token representation. In a second phase, to improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced.
arXiv Detail & Related papers (2021-04-02T01:42:01Z)
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation. We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.