A Training Framework for Optimal and Stable Training of Polynomial Neural Networks
- URL: http://arxiv.org/abs/2505.11589v1
- Date: Fri, 16 May 2025 18:00:02 GMT
- Title: A Training Framework for Optimal and Stable Training of Polynomial Neural Networks
- Authors: Forsad Al Hossain, Tauhidur Rahman,
- Abstract summary: Polynomial Neural Networks (PNNs) are pivotal for applications such as privacy-preserving Encryption viaHE.<n>Low-degrees can limit model expressivity, while higher-degrees often suffer from numerical instability and gradient explosion.<n>We introduce a robust and versatile training framework featuring two innovations: 1) a novel Boundary Loss that exponentially penalizes inputs outside a predefined stable range, and 2) Selective Gradient Clipping that effectively tames gradient magnitudes while preserving essential Normalization statistics.
- Score: 0.462761393623313
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By replacing standard non-linearities with polynomial activations, Polynomial Neural Networks (PNNs) are pivotal for applications such as privacy-preserving inference via Homomorphic Encryption (HE). However, training PNNs effectively presents a significant challenge: low-degree polynomials can limit model expressivity, while higher-degree polynomials, crucial for capturing complex functions, often suffer from numerical instability and gradient explosion. We introduce a robust and versatile training framework featuring two synergistic innovations: 1) a novel Boundary Loss that exponentially penalizes activation inputs outside a predefined stable range, and 2) Selective Gradient Clipping that effectively tames gradient magnitudes while preserving essential Batch Normalization statistics. We demonstrate our framework's broad efficacy by training PNNs within deep architectures composed of HE-compatible layers (e.g., linear layers, average pooling, batch normalization, as used in ResNet variants) across diverse image, audio, and human activity recognition datasets. These models consistently achieve high accuracy with low-degree polynomial activations (such as degree 2) and, critically, exhibit stable training and strong performance with polynomial degrees up to 22, where standard methods typically fail or suffer severe degradation. Furthermore, the performance of these PNNs achieves a remarkable parity, closely approaching that of their original ReLU-based counterparts. Extensive ablation studies validate the contributions of our techniques and guide hyperparameter selection. We confirm the HE-compatibility of the trained models, advancing the practical deployment of accurate, stable, and secure deep learning inference.
Related papers
- Fast State-Augmented Learning for Wireless Resource Allocation with Dual Variable Regression [83.27791109672927]
We show how a state-augmented graph neural network (GNN) parametrization for the resource allocation policy circumvents the drawbacks of the ubiquitous dual subgradient methods.<n>Lagrangian maximizing state-augmented policies are learned during the offline training phase.<n>We prove a convergence result and an exponential probability bound on the excursions of the dual function (iterate) optimality gaps.
arXiv Detail & Related papers (2025-06-23T15:20:58Z) - DeePoly: A High-Order Accuracy Scientific Machine Learning Framework for Function Approximation and Solving PDEs [5.483488375189695]
This work introduces a novel framework that transforms the Dee solution to a two-stage approach.<n>The strategic combination leverages the strengths of both methods.<n>This approach also serves as the open-source project also serves as the paper.
arXiv Detail & Related papers (2025-06-05T04:10:52Z) - Degree-Optimized Cumulative Polynomial Kolmogorov-Arnold Networks [0.0]
Kolmogorov-Arnold networks (CP-KAN) is a neural architecture combining Chebyshev basis functions and quadratic unconstrained binary optimization (QUBO)<n>Our contribution involves reformulating the degree selection problem as a QUBO task, reducing the complexity from $O($N) to a single optimization step per layer.<n>The architecture performs well in regression tasks with limited data, showing good robustness to input scales and natural regularization properties from its basis.
arXiv Detail & Related papers (2025-05-21T07:59:12Z) - Layer-wise Quantization for Quantized Optimistic Dual Averaging [75.4148236967503]
We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training.<n>We propose a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs.
arXiv Detail & Related papers (2025-05-20T13:53:58Z) - Dual Cone Gradient Descent for Training Physics-Informed Neural Networks [0.0]
Physics-informed dual neural networks (PINNs) have emerged as a prominent approach for solving partial differential equations.<n>We propose a novel framework, Dual Cone Gradient Descent (DCGD), which adjusts the direction of the updated gradient to ensure it falls within a cone region.
arXiv Detail & Related papers (2024-09-27T03:27:46Z) - Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization [40.68266398473983]
Training neural networks which are robust to adversarial attacks remains an important problem in deep learning.
We reformulate the training problems for two-layer ReLU andfty$ activation networks as convex programs.
We demonstrate the practical utility of convex adversarial training on large-scale problems.
arXiv Detail & Related papers (2024-05-22T22:08:13Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.<n>Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Training Large Scale Polynomial CNNs for E2E Inference over Homomorphic
Encryption [33.35896071292604]
Training large-scale CNNs that during inference can be run under Homomorphic Encryption (HE) is challenging.
We provide a novel training method for large CNNs such as ResNet-152 and ConvNeXt models.
arXiv Detail & Related papers (2023-04-26T20:41:37Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture [0.0]
This work presents a two-stage adaptive framework for developing deep neural network (DNN) architectures that generalize well for a given training data set.
In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers.
We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm.
arXiv Detail & Related papers (2022-11-13T09:51:16Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.