Efficient training for large-scale optical neural network using an evolutionary strategy and attention pruning
- URL: http://arxiv.org/abs/2505.12906v1
- Date: Mon, 19 May 2025 09:41:11 GMT
- Title: Efficient training for large-scale optical neural network using an evolutionary strategy and attention pruning
- Authors: Zhiwei Yang, Zeyang Fan, Yihang Lai, Qi Chen, Tian Zhang, Jian Dai, Kun Xu,
- Abstract summary: MZI-based block optical neural networks (BONNs) can achieve large-scale network models.<n>We propose an attention-based pruning (CAP) algorithm for large-scale BONNs.<n>Our proposed CAP algorithm show excellent potential for larger-scale network models and more complex tasks.
- Score: 14.20309603187239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: MZI-based block optical neural networks (BONNs), which can achieve large-scale network models, have increasingly drawn attentions. However, the robustness of the current training algorithm is not high enough. Moreover, large-scale BONNs usually contain numerous trainable parameters, resulting in expensive computation and power consumption. In this article, by pruning matrix blocks and directly optimizing the individuals in population, we propose an on-chip covariance matrix adaptation evolution strategy and attention-based pruning (CAP) algorithm for large-scale BONNs. The calculated results demonstrate that the CAP algorithm can prune 60% and 80% of the parameters for MNIST and Fashion-MNIST datasets, respectively, while only degrades the performance by 3.289% and 4.693%. Considering the influence of dynamic noise in phase shifters, our proposed CAP algorithm (performance degradation of 22.327% for MNIST dataset and 24.019% for Fashion-MNIST dataset utilizing a poor fabricated chip and electrical control with a standard deviation of 0.5) exhibits strongest robustness compared with both our previously reported block adjoint training algorithm (43.963% and 41.074%) and the covariance matrix adaptation evolution strategy (25.757% and 32.871%), respectively. Moreover, when 60% of the parameters are pruned, the CAP algorithm realizes 88.5% accuracy in experiment for the simplified MNIST dataset, which is similar to the simulation result without noise (92.1%). Additionally, we simulationally and experimentally demonstrate that using MZIs with only internal phase shifters to construct BONNs is an efficient way to reduce both the system area and the required trainable parameters. Notably, our proposed CAP algorithm show excellent potential for larger-scale network models and more complex tasks.
Related papers
- Exploring Spiking Neural Networks for Binary Classification in Multivariate Time Series at the Edge [0.9282545044546486]
We present a general framework for training spiking neural networks (SNNs) to perform binary classification on multivariate time series.<n>We apply it to the task of detecting low signal-to-noise ratio radioactive sources in gamma-ray spectral data.<n>The resulting SNNs, with as few as 49 neurons and 66 synapses, achieve a 51.8% true positive rate (TPR) at a false alarm rate of 1/hr.<n> Hardware deployment on the microCaspian neuromorphic platform demonstrates 2mW power consumption and 20.2ms latency.
arXiv Detail & Related papers (2025-10-23T20:52:11Z) - SDSNN: A Single-Timestep Spiking Neural Network with Self-Dropping Neuron and Bayesian Optimization [3.939441643960418]
Spiking Neural Networks (SNNs) are an emerging biologically inspired computational model.<n>SNNs transmit information through discrete spike signals, which substantially reduces computational energy consumption.<n>We propose a single-timestep SNN, which enhances accuracy and reduces computational energy consumption in a single timestep.
arXiv Detail & Related papers (2025-08-01T03:41:47Z) - A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction [4.501295034557007]
This work proposes a novel Comprehensively Adaptive Architectural Optimization-based Variable Quantum Neural Network (CA-QNN)<n>The model converts workload data into qubits, processed through qubit neurons with Controlled NOT-gated activation functions for intuitive pattern recognition.<n>The proposed model demonstrates superior prediction accuracy, reducing prediction errors by up to 93.40% and 91.27% compared to existing deep learning and QNN-based approaches.
arXiv Detail & Related papers (2025-07-11T05:07:21Z) - Efficient Fault Detection in WSN Based on PCA-Optimized Deep Neural Network Slicing Trained with GOA [0.6827423171182154]
Traditional fault detection methods often struggle with optimizing deep neural networks (DNNs) for efficient performance.<n>This study proposes a novel hybrid method combining Principal Component Analysis (PCA) with a DNN optimized by the Grasshopper Optimization Algorithm (GOA) to address these limitations.<n>Our approach achieves a remarkable 99.72% classification accuracy, with exceptional precision and recall, outperforming conventional methods.
arXiv Detail & Related papers (2025-05-11T15:51:56Z) - Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models [6.711674863088882]
We propose ECoGNNs-LLMMHs, a framework that integrates graph neural networks with task-adaptive message passing and large language model-enhanced meta-heuristic algorithms.<n>Compared with state-of-the-art works, ECoGNN exhibits lower prediction error in the post-HLS prediction task, with the error reduced by 57.27%.<n>For post-implementation prediction tasks, ECoGNN demonstrates the lowest prediction errors, with average reductions of 17.6% for flip-flop (FF) usage, 33.7% for critical path (CP)
arXiv Detail & Related papers (2025-04-28T10:08:56Z) - Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval.<n>A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed.<n>The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z) - An Effective Networks Intrusion Detection Approach Based on Hybrid
Harris Hawks and Multi-Layer Perceptron [47.81867479735455]
This paper proposes an Intrusion Detection System (IDS) employing the Harris Hawks Optimization (HHO) to optimize Multilayer Perceptron learning.
HHO-MLP aims to select optimal parameters in its learning process to minimize intrusion detection errors in networks.
HHO-MLP showed superior performance by attaining top scores with accuracy rate of 93.17%, sensitivity level of 95.41%, and specificity percentage of 95.41%.
arXiv Detail & Related papers (2024-02-21T06:25:50Z) - Hybrid-Task Meta-Learning: A GNN Approach for Scalable and Transferable Bandwidth Allocation [50.96751567777229]
We develop a deep learning-based bandwidth allocation policy that is scalable with the number of users and transferable to different communication scenarios.<n>To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN)<n>We develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios.
arXiv Detail & Related papers (2023-12-23T04:25:12Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Towards Hyperparameter-Agnostic DNN Training via Dynamical System
Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN.
This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - WSEBP: A Novel Width-depth Synchronous Extension-based Basis Pursuit
Algorithm for Multi-Layer Convolutional Sparse Coding [4.521915878576165]
Multi-layer convolutional sparse coding (ML-CSC) can interpret the convolutional neural networks (CNNs)
Many current state-of-art (SOTA) pursuit algorithms require multiple iterations to optimize the solution of ML-CSC.
We propose a novel width-depth synchronous extension-based basis pursuit (WSEBP) algorithm which solves the ML-CSC problem without the limitation of the number of iterations.
arXiv Detail & Related papers (2022-03-28T15:53:52Z) - Spike time displacement based error backpropagation in convolutional
spiking neural networks [0.6193838300896449]
In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures.
The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST, confirm that this algorithm has been applicable in deep SNNs.
We consider a convolutional SNN with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process.
arXiv Detail & Related papers (2021-08-31T05:18:59Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.