Related papers: Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

URL: http://arxiv.org/abs/2308.08230v1
Date: Wed, 16 Aug 2023 09:03:13 GMT
Title: Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance
Authors: Xinghua Xue, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Tao Luo, Lei Zhang, Huawei Li, Xiaowei Li
Abstract summary: Winograd convolution can reduce the fault-tolerant design overhead by 55.77% on average without any accuracy loss compared to standard convolution. When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults.
Score: 14.588891723027892
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Winograd is generally utilized to optimize convolution performance and computational efficiency because of the reduced multiplication operations, but the reliability issues brought by winograd are usually overlooked. In this work, we observe the great potential of winograd convolution in improving neural network (NN) fault tolerance. Based on the observation, we evaluate winograd convolution fault tolerance comprehensively from different granularities ranging from models, layers, and operation types for the first time. Then, we explore the use of inherent fault tolerance of winograd convolution for cost-effective NN protection against soft errors. Specifically, we mainly investigate how winograd convolution can be effectively incorporated with classical fault-tolerant design approaches including triple modular redundancy (TMR), fault-aware retraining, and constrained activation functions. According to our experiments, winograd convolution can reduce the fault-tolerant design overhead by 55.77\% on average without any accuracy loss compared to standard convolution, and further reduce the computing overhead by 17.24\% when the inherent fault tolerance of winograd convolution is considered. When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults.

Related papers

ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks [71.02216400133858]
Physics-informed neural networks (PINNs) have earned high expectations in solving partial differential equations (PDEs) Previous research observed the propagation failure phenomenon of PINNs. This paper provides the first formal and in-depth study of propagation failure and its root cause.
arXiv Detail & Related papers (2025-02-02T13:56:38Z)
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales [4.1966303054440655]
Quantization of diffusion models has been explored in recent works to reduce compute costs and memory bandwidth usage. For text-to-image generation task, the 8-bit fully-quantized diffusion model with Winograd provides near-lossless quality. For image classification, our method outperforms the state-of-the-art Winograd PTQ method by 1.62% and 2.56% in top-1 ImageNet accuracy.
arXiv Detail & Related papers (2024-12-27T09:05:48Z)
Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs. We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs. We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Low-Rank Winograd Transformation for 3D Convolutional Neural Networks [25.236436823266203]
This paper focuses on Winograd transformation in 3D convolutional neural networks (CNNs) We introduce a low-rank Winograd transformation, a novel training paradigm that decouples the original large tensor into two less storage-required trainable tensors. We show that our proposed low-rank oriented sparse granularity permits practical Winograd acceleration compared with the vanilla counterpart.
arXiv Detail & Related papers (2023-01-26T15:44:22Z)
Winograd Convolution: A Perspective from Fault Tolerance [11.626169868836955]
Winograd convolution is originally proposed to reduce the computing overhead by converting multiplication in neural network (NN) with addition via linear transformation. We observe its great potential in improving NN fault tolerance and evaluate its fault tolerance for the first time.
arXiv Detail & Related papers (2022-02-17T14:19:55Z)
Orthogonal Graph Neural Networks [53.466187667936026]
Graph neural networks (GNNs) have received tremendous attention due to their superiority in learning node representations. stacking more convolutional layers significantly decreases the performance of GNNs. We propose a novel Ortho-GConv, which could generally augment the existing GNN backbones to stabilize the model training and improve the model's generalization performance.
arXiv Detail & Related papers (2021-09-23T12:39:01Z)
Winograd Algorithm for AdderNet [54.93995545896655]
Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions. This paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs.
arXiv Detail & Related papers (2021-05-12T09:13:34Z)
$\sigma^2$R Loss: a Weighted Loss by Multiplicative Factors using Sigmoidal Functions [0.9569316316728905]
We introduce a new loss function called squared reduction loss ($sigma2$R loss), which is regulated by a sigmoid function to inflate/deflate the error per instance. Our loss has clear intuition and geometric interpretation, we demonstrate by experiments the effectiveness of our proposal.
arXiv Detail & Related papers (2020-09-18T12:34:40Z)
LANCE: Efficient Low-Precision Quantized Winograd Convolution for Neural Networks Based on Graphics Processing Units [6.110973485878557]
We propose an efficient low-precision quantized Winograd convolution algorithm, called LANCE, which combines the advantages of fast convolution and quantization techniques. We show that our 8-bit quantized Winograd convolution improves the performance by up to 2.40x over the full-precision convolution with trivial accuracy loss.
arXiv Detail & Related papers (2020-03-19T09:46:50Z)
Searching for Winograd-aware Quantized Networks [12.351250944079949]
We propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations. We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10.
arXiv Detail & Related papers (2020-02-25T07:53:53Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness. We show that focal loss allows us to learn models that are already very well calibrated. We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.