Deep Delta Learning
- URL: http://arxiv.org/abs/2601.00417v1
- Date: Thu, 01 Jan 2026 18:11:38 GMT
- Title: Deep Delta Learning
- Authors: Yifan Zhang, Yifeng Liu, Mengdi Wang, Quanquan Gu,
- Abstract summary: We introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection.<n>We provide a spectral analysis of this operator, demonstrating that the gate $(mathbfX)$ enables dynamic between identity mapping, projection, and geometric reflection.<n>This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics.
- Score: 91.75868893250662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $β(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $β(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.
Related papers
- PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z) - Hybrid Dual-Path Linear Transformations for Efficient Transformer Architectures [0.0]
We introduce the Hybrid Dual-Path Linear (HDPL) operator, which decomposes the affine transformation into two topologically distinct pathways.<n> Experiments on the FineWeb-Edu dataset demonstrate that the HDPL architecture outperforms a standard Llama-style baseline.<n>We discuss how the explicit materialization of a probabilistic latent space within the Transformer backbone serves as a vital architectural affordance.
arXiv Detail & Related papers (2026-02-05T20:16:10Z) - Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products [60.72655477351486]
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling.<n>Existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices.
arXiv Detail & Related papers (2025-02-14T16:59:05Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - EqNIO: Subequivariant Neural Inertial Odometry [33.96552018734359]
We show that IMU data transforms equivariantly, when rotated around the gravity vector and reflected with respect to arbitrary planes parallel to gravity.
We then map the IMU data into this frame, thereby achieving an invariant canonicalization that can be directly used with off-the-shelf inertial odometry networks.
arXiv Detail & Related papers (2024-08-12T17:42:46Z) - Deep Tensor Network [9.910562011343009]
We introduce the Deep Network, a new architectural framework that reformulates attention by unifying the expressive power of tensor algebra with neural network design.<n>Our approach moves beyond both conventional dot-product attention and subsequent linear-time approximations to capture higher-order statistical dependencies.
arXiv Detail & Related papers (2023-11-18T14:41:33Z) - Machine Learning for the identification of phase-transitions in interacting agent-based systems: a Desai-Zwanzig example [0.0]
We propose a data-driven framework that pinpoints phase transitions for an agent-based model in its mean-field limit.
To this end, we use the manifold learning algorithm Maps to identify a parsimonious set of data-driven latent variables.
We then utilize a deep learning framework to obtain a conformal reparametrization of the data-driven coordinates.
arXiv Detail & Related papers (2023-10-29T15:07:08Z) - Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections.
An entangled residual mapping replaces the identity skip connections with specialized entangled mappings.
We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z) - Deep neural networks for inverse problems with pseudodifferential
operators: an application to limited-angle tomography [0.4110409960377149]
We propose a novel convolutional neural network (CNN) designed for learning pseudodifferential operators ($Psi$DOs) in the context of linear inverse problems.
We show that, under rather general assumptions on the forward operator, the unfolded iterations of ISTA can be interpreted as the successive layers of a CNN.
In particular, we prove that, in the case of LA-CT, the operations of upscaling, downscaling and convolution, can be exactly determined by combining the convolutional nature of the limited angle X-ray transform and basic properties defining a wavelet system.
arXiv Detail & Related papers (2020-06-02T14:03:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.