NOVA: Discovering Well-Conditioned Winograd Transforms through Numerical Optimization of Vandermonde Arithmetic
- URL: http://arxiv.org/abs/2512.18453v1
- Date: Sat, 20 Dec 2025 17:55:26 GMT
- Title: NOVA: Discovering Well-Conditioned Winograd Transforms through Numerical Optimization of Vandermonde Arithmetic
- Authors: Jayant Lohia,
- Abstract summary: Winograd convolution is the standard algorithm for efficient inference.<n>It faces a critical barrier in the modern era of low precision computing: numerical instability.<n>We introduce NOVA, a discovery framework that breaks the decades old convention of integer kappa percent.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Winograd convolution is the standard algorithm for efficient inference, reducing arithmetic complexity by 2.25x for 3x3 kernels. However, it faces a critical barrier in the modern era of low precision computing: numerical instability. As tiles scale to maximize efficiency (e.g., F(6,3), F(8,3)), the condition numbers of standard integer based transforms explode, reaching kappa = 2 x 10^5 for F(8,3), rendering them unusable in FP16 or Int8. We introduce NOVA (Numerical Optimization of Vandermonde Arithmetic), a discovery framework that breaks the decades old convention of integer interpolation. Treating Winograd point selection as a continuous optimization problem, NOVA searches the manifold R^n-1 via Evolution Strategy, snaps candidates to simple rationals, and guarantees correctness via symbolic verification. This process uncovers a hidden landscape of stable, fractional configurations such as {+-5/6, +-7/6, +-3/5} that defy traditional vocabulary constraints. The impact is transformative: NOVA improves the conditioning of F(8,3) by 415x in 1D, which squares to a 172,484x improvement for 2D convolution. In real world FP16 ImageNet inference, where standard transforms collapse to random chance (e.g., 4.7 percent accuracy on VGG16), NOVA's points restore full accuracy (75 to 78 percent), recovering over 70 percentage points without retraining, calibration, or learned parameters. These discovered transforms act as drop in replacements, effectively unlocking the efficiency of large tile Winograd convolution for next generation hardware.
Related papers
- Orthogonal Finetuning Made Scalable [92.34573849209238]
Orthogonal finetuning (OFT) offers highly parameter-efficient adaptation while preventing catastrophic forgetting, but its high runtime and memory demands limit practical deployment.<n>We identify the core computational bottleneck in OFT as its weight-centric implementation, which relies on costly matrix-matrix multiplications with cubic complexity.<n>We propose OFTv2, an input-centric reformulation that instead uses matrix-vector multiplications (i.e., matrix-free computation), reducing the computational cost to quadratic.<n>These modifications allow OFTv2 to achieve up to 10x faster training and 3x lower GPU memory usage without
arXiv Detail & Related papers (2025-06-24T17:59:49Z) - ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation [0.34952465649465553]
This paper proposes ConvBench, a primitive-level benchmark for the evaluation and comparison of convolution algorithms.
It assesses 9243 convolution operations derived from 1097 real-world deep learning models.
The experiments showed results faster than Im2col-GEMM in 93.6% of the convolutions.
arXiv Detail & Related papers (2024-07-15T13:58:24Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned
Stochastic Optimization [69.47358238222586]
Second orderimations allow parameter update step size and direction to adapt to loss curvature.
Recently, Shampoo introduced a Kronecker factored preconditioner to reduce these requirements.
It takes inverse matrix roots of ill-conditioned matrices.
This requires 64-bit precision, imposing strong hardware constraints.
arXiv Detail & Related papers (2023-05-30T21:15:45Z) - Going Further With Winograd Convolutions: Tap-Wise Quantization for
Efficient Inference on 4x4 Tile [7.705762754955851]
Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm.
We propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles.
We show how to integrate such custom modules in an industrial-grade, programmable DSA.
arXiv Detail & Related papers (2022-09-26T19:29:51Z) - An Improved Normed-Deformable Convolution for Crowd Counting [70.02434289611566]
Deformable convolution is proposed to exploit the scale-adaptive capabilities for CNN features in the heads.
An improved Normed-Deformable Convolution (textiti.e.,NDConv) is proposed in this paper.
Our method outperforms state-of-the-art methods on ShanghaiTech A, ShanghaiTech B, UCF_QNRF, and UCF_CC_50 dataset.
arXiv Detail & Related papers (2022-06-16T10:56:26Z) - Winograd Convolution for Deep Neural Networks: Efficient Point Selection [0.8043754868448141]
We propose a novel approach to point selection using points of the form -1/c, -c, c, 1/c using the full range of real-valued numbers for c.
We find that the error for different values of c forms a rough curve across the range of real-value numbers helping to localize the values of c that reduce error.
We study a range of sizes for small convolutions and achieve reduction in error ranging from 2% to around 59% for both 1D and 2D convolutions.
arXiv Detail & Related papers (2022-01-25T15:00:54Z) - 8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values.
This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters.
In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z) - Accelerating Large Kernel Convolutions with Nested Winograd
Transformation.pdf [2.193040410545991]
This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions.
Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4x4 to 31x31 convolutions.
arXiv Detail & Related papers (2021-02-26T02:42:42Z) - Efficient Residue Number System Based Winograd Convolution [15.210764522845416]
Winograd algorithm can reduce the computational complexity of convolutional neural networks (CNN) with weights and activations represented in floating point.
Our work extends the Winograd algorithm to Residue Number System (RNS)
The minimal complexity convolution is computed precisely over large transformation tile.
arXiv Detail & Related papers (2020-07-23T19:07:06Z) - Region adaptive graph fourier transform for 3d point clouds [51.193111325231165]
We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes.
The RA-GFT achieves better complexity-performance trade-offs than previous approaches.
arXiv Detail & Related papers (2020-03-04T02:47:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.