Related papers: JPmHC Dynamical Isometry via Orthogonal Hyper-Connections

JPmHC Dynamical Isometry via Orthogonal Hyper-Connections

URL: http://arxiv.org/abs/2602.18308v1
Date: Fri, 20 Feb 2026 16:06:01 GMT
Title: JPmHC Dynamical Isometry via Orthogonal Hyper-Connections
Authors: Biswa Sengupta, Jinhua Wang, Leo Brunswic,
Abstract summary: JPmHC is a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams.<n>It prevents gradient pathologies and enhances stability.<n>It achieves faster convergence, higher accuracy, and lower computational cost compared to bistochastic baselines.
Score: 2.4311915994390403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning, exemplified by Hyper-Connections (HC), have expanded the residual connection paradigm by introducing wider residual streams and diverse connectivity patterns. While these innovations yield significant performance gains, they compromise the identity mapping property of residual connections, leading to training instability, limited scalability, and increased memory overhead. To address these challenges, we propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a framework that replaces identity skips with a trainable linear mixer acting on n parallel streams while explicitly controlling gradient conditioning. By constraining the mixer M on operator-norm-bounded manifolds (e.g., bistochastic, Stiefel, Grassmann), JPmHC prevents gradient pathologies and enhances stability. JPmHC introduces three key contributions: (i) a free-probability analysis that predicts Jacobian spectra for structured skips, providing actionable design rules for mixer selection; (ii) memory-efficient implicit differentiation for fixed-point projections, reducing activation memory and synchronization overhead; and (iii) a Stiefel-constrained mixer via Cayley transforms, ensuring orthogonality without post-hoc normalization. Empirical evaluations on ARC-AGI demonstrate that JPmHC achieves faster convergence, higher accuracy, and lower computational cost compared to bistochastic baselines. As a flexible and scalable extension of HC, JPmHC advances spectrum-aware, stable, and efficient deep learning, offering insights into topological architecture design and foundational model evolution.

Related papers

mHC: Manifold-Constrained Hyper-Connections [43.69451283828811]
Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm by expanding the residual stream width and diversifying connectivity patterns.<n>We propose Manifold-Constrained Hyper-Connections (mHC) to restore the identity mapping property intrinsic to the residual connection.<n>mHC is effective for training at scale, offering tangible performance improvements and superior scalability.
arXiv Detail & Related papers (2025-12-31T14:16:26Z)
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation [72.69742127579508]
Recent unified models integrate understanding experts (e.g., LLMs) with generative experts (e.g., diffusion models)<n>In this work, we propose HBridge, an asymmetric H-shaped architecture that enables heterogeneous experts to optimally leverage pretrained priors.<n> Extensive experiments across multiple benchmarks demonstrate the effectiveness and superior performance of HBridge.
arXiv Detail & Related papers (2025-11-25T17:23:38Z)
Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting [49.40321003932633]
Adapformer is an advanced Transformer-based framework that merges the benefits of CI and CD methodologies through effective channel management.<n>Adapformer achieves superior performance over existing models, enhancing both predictive accuracy and computational efficiency.
arXiv Detail & Related papers (2025-11-18T16:24:05Z)
Bifidelity Karhunen-Loève Expansion Surrogate with Active Learning for Random Fields [0.4899818550820576]
We present a bifidelity Karhunen-Loeve expansion (KLE) surrogate model for field-valued quantities of interest (QoIs) under uncertain inputs.<n>We form an active learning strategy that adaptively selects new HF evaluations based on the surrogate's generalization error.<n>New HF samples are then acquired by maximizing an expected improvement criterion, targeting regions of high surrogate error.
arXiv Detail & Related papers (2025-11-05T04:14:44Z)
Flow-Matching Guided Deep Unfolding for Hyperspectral Image Reconstruction [53.26903617819014]
Flow-Matching-guided Unfolding network (FMU) is first to integrate flow matching into HSI reconstruction.<n>To further strengthen the learned dynamics, we introduce a mean velocity loss.<n>Experiments on both simulated and real datasets show that FMU significantly outperforms existing approaches in reconstruction quality.
arXiv Detail & Related papers (2025-10-02T11:32:00Z)
Advanced Hybrid Transformer LSTM Technique with Attention and TS Mixer for Drilling Rate of Penetration Prediction [0.9282594860064428]
This study presents a new deep learning Hybrid LSTM-Trans-Mixer-Att framework for rate of Penetration prediction.<n>The proposed framework combines sequential memory, static feature interactions, global context learning, and dynamic feature weighting.<n> Experimental validation on real-world drilling datasets demonstrates superior performance, achieving an Rsquare of 0.9991 and a MAPE of 1.447%.
arXiv Detail & Related papers (2025-08-07T09:45:56Z)
MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation [74.34220141721231]
We present MPQ-DMv2, an improved textbfMixed textbfPrecision textbfQuantization framework for extremely low-bit textbfDiffusion textbfModels.
arXiv Detail & Related papers (2025-07-06T08:16:50Z)
EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
Nonparametric learning of covariate-based Markov jump processes using RKHS techniques [3.3005714301829148]
We propose a novel nonparametric approach for linking co variables to Continuous Time Markov Chains (CTMCs)<n>CTMCs provide a robust framework for modeling transitions across clinical or behavioral states.<n>We use a generalized Representer Theorem to enable tractable inference in functional space.
arXiv Detail & Related papers (2025-05-06T02:26:02Z)
Quantized and Asynchronous Federated Learning [22.40154714677385]
We develop a novel scheme, Quantized Federated AsynchronousQAL, to deal with the communication bottleneck. We prove that QAL achieves $mathtcalqr$dic convergence without requiring uniform client arrivals. We validate our theoretical findings by using standard benchmarks.
arXiv Detail & Related papers (2024-09-30T21:22:41Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.