Neuronal Attention Circuit (NAC) for Representation Learning
- URL: http://arxiv.org/abs/2512.10282v2
- Date: Fri, 12 Dec 2025 07:09:14 GMT
- Title: Neuronal Attention Circuit (NAC) for Representation Learning
- Authors: Waleed Razzaq, Izis Kanjaraway, Yun-Bo Zhao,
- Abstract summary: We introduce a CT-Attention mechanism that reformulates attention logits computation as the solution to a linear first-order ODE.<n>We implement NAC in diverse domains, including irregular time-series classification, lane-keeping for autonomous vehicles, and industrial prognostics.
- Score: 2.0573301822495553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention improves representation learning over RNNs, but its discrete nature limits continuous-time (CT) modeling. We introduce Neuronal Attention Circuit (NAC), a novel, biologically plausible CT-Attention mechanism that reformulates attention logits computation as the solution to a linear first-order ODE with nonlinear interlinked gates derived from repurposing \textit{C. elegans} Neuronal Circuit Policies (NCPs) wiring mechanism. NAC replaces dense projections with sparse sensory gates for key-query projections and a sparse backbone network with two heads for computing \textit{content-target} and \textit{learnable time-constant} gates, enabling efficient adaptive dynamics. NAC supports three attention logit computation modes: (i) explicit Euler integration, (ii) exact closed-form solution, and (iii) steady-state approximation. To improve memory intensity, we implemented a sparse Top-\emph{K} pairwise concatenation scheme that selectively curates key-query interactions. We provide rigorous theoretical guarantees, including state stability, bounded approximation errors, and universal approximation. Empirically, we implemented NAC in diverse domains, including irregular time-series classification, lane-keeping for autonomous vehicles, and industrial prognostics. We observed that NAC matches or outperforms competing baselines in accuracy and occupies an intermediate position in runtime and memory efficiency compared with several CT baselines.
Related papers
- NTK-Guided Implicit Neural Teaching [30.8213717343444]
Implicit Neural Representations (INRs) parameterize continuous signals via multilayer perceptrons (MLPs)<n>We propose NTK-Guided Implicit Neural Teaching (NINT)<n>NINT scores examples by the norm of their NTK-augmented loss gradients, capturing both fitting errors and heterogeneous leverage.
arXiv Detail & Related papers (2025-11-19T14:43:04Z) - PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization [17.316927027489506]
Automated Circuit Discovery (ACDC) has emerged as a pivotal methodology in circuit discovery.<n>But its application to large language models is severely limited by computational inefficiency and prohibitively high memory requirements.<n>Our proposed method for accelerating automated circuit discovery, Per Attention Head Quantization (PAHQ), takes a fundamentally different approach by optimizing the efficiency of each individual patching operation.
arXiv Detail & Related papers (2025-10-27T12:24:14Z) - RAE: A Neural Network Dimensionality Reduction Method for Nearest Neighbors Preservation in Vector Search [3.5003141048841004]
Regularized Auto-Encoder (RAE) for k-NN preserving dimensionality reduction.<n>Regularization establishes upper bound on norm distortion rate of transformed vectors.<n> RAE achieves superior k-NN recall compared to existing DR approaches.
arXiv Detail & Related papers (2025-09-30T06:25:38Z) - Learning based convex approximation for constrained parametric optimization [11.379408842026981]
We propose an input neural network (ICNN)-based self-supervised learning framework to solve constrained optimization problems.<n>We provide rigorous convergence analysis, showing that the framework converges to a Karush-Kuhn-Tucker (KKT) approximation point of the original problem.<n>Our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.
arXiv Detail & Related papers (2025-05-07T00:33:14Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Efficient kernel surrogates for neural network-based regression [0.8030359871216615]
We study the performance of the Conjugate Kernel (CK), an efficient approximation to the Neural Tangent Kernel (NTK)
We show that the CK performance is only marginally worse than that of the NTK and, in certain cases, is shown to be superior.
In addition to providing a theoretical grounding for using CKs instead of NTKs, our framework suggests a recipe for improving DNN accuracy inexpensively.
arXiv Detail & Related papers (2023-10-28T06:41:47Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games [31.51588071503617]
We propose a novel modelling formalism called neuro-symbolic concurrent games (NS-CSGs)
We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards.
We present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs.
arXiv Detail & Related papers (2022-02-13T08:39:00Z) - SCORE: Approximating Curvature Information under Self-Concordant
Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input.
The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z) - Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned
Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network.
We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.