Related papers: Continuous-Depth Transformers with Learned Control Dynamics

Continuous-Depth Transformers with Learned Control Dynamics

URL: http://arxiv.org/abs/2601.10007v1
Date: Thu, 15 Jan 2026 02:35:37 GMT
Title: Continuous-Depth Transformers with Learned Control Dynamics
Authors: Peter Jemley,
Abstract summary: We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation block.<n>We show that our approach treats depth as a continuous variable governed by a learned vector field $F_(H,, u)$, where $u$ is a low-dimensional control signal injected via explicit concatenation.<n>Our results demonstrate that continuous-depth dynamics with learned control signals provide a viable, efficient mechanism for steerable language generation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a hybrid transformer architecture that replaces discrete middle layers with a continuous-depth Neural Ordinary Differential Equation (ODE) block, enabling inference-time control over generation attributes via a learned steering signal. Unlike standard transformers that process representations through fixed discrete layers, our approach treats depth as a continuous variable governed by a learned vector field $F_θ(H, τ, u)$, where $u$ is a low-dimensional control signal injected via explicit concatenation. We validate the architecture through four experiments: (1) gradient flow stability with zero exploding/vanishing gradient events, (2) semantic steering achieving 98\%/88\% accuracy for positive/negative sentiment control, (3) continuous interpolation validated by a negligible 0.068\% trajectory divergence between fixed and adaptive solvers, and (4) efficiency benchmarking demonstrating latency parity with standard discrete baselines. Additionally, we show that adaptive ODE solvers reveal geometric structure in the learned dynamics: the control signal partitions the vector field into distinct dynamical regimes with different curvature characteristics. The adjoint method enables $O(1)$ memory training regardless of integration depth. Our results demonstrate that continuous-depth dynamics with learned control signals provide a viable, efficient mechanism for steerable language generation.

Related papers

Internalizing LLM Reasoning via Discovery and Replay of Latent Actions [4.830503861275364]
Internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute.<n>We propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem.
arXiv Detail & Related papers (2026-02-04T08:44:57Z)
Deep Delta Learning [91.75868893250662]
We introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection.<n>We provide a spectral analysis of this operator, demonstrating that the gate $(mathbfX)$ enables dynamic between identity mapping, projection, and geometric reflection.<n>This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics.
arXiv Detail & Related papers (2026-01-01T18:11:38Z)
Flexible Gravitational-Wave Parameter Estimation with Transformers [73.44614054040267]
We introduce a flexible transformer-based architecture paired with a training strategy that enables adaptation to diverse analysis settings at inference time.<n>We demonstrate that a single flexible model -- called Dingo-T1 -- can analyze 48 gravitational-wave events from the third LIGO-Virgo-KAGRA Observing Run.
arXiv Detail & Related papers (2025-12-02T17:49:08Z)
Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks [3.924071936547547]
Gated neural networks (RNNs) implicitly induce adaptive learning-rate behavior.<n>Effect arises from the coupling between state-space time scales--parametrized by the gates--and parameter-space dynamics.<n> Empirical simulations corroborate these claims.
arXiv Detail & Related papers (2025-08-16T18:19:34Z)
DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling [16.33495160112142]
We introduce DDOT, a transformer-based model designed to reconstruct multidimensional ODEs in symbolic form.<n>By incorporating an auxiliary task predicting the ODE's derivative, DDOT effectively captures both structure and dynamic behavior.<n>DDOT outperforms existing symbolic regression methods, achieving an absolute improvement of 4.58% and 1.62% in $P(R2 > 0.9)$ for reconstruction and tasks generalization.
arXiv Detail & Related papers (2025-06-23T11:24:52Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
KEEC: Koopman Embedded Equivariant Control [29.738391644702947]
An efficient way to control systems with unknown nonlinear dynamics is to find an appropriate embedding or representation.<n>Koopman Embedded Equivariant Control (KEEC) learns an embedding of the states and vector fields such that a Koopman operator is approximated as the latent dynamics.<n>Our algorithm achieves superior performances in the experiments conducted on various control domains.
arXiv Detail & Related papers (2023-12-04T00:11:27Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Deep Learning Approximation of Diffeomorphisms via Linear-Control Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls. We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z)
Control of Stochastic Quantum Dynamics with Differentiable Programming [0.0]
We propose a framework for the automated design of control schemes based on differentiable programming. We apply this approach to state preparation and stabilization of a qubit subjected to homodyne detection. Despite the resulting poor signal-to-noise ratio, we can train our controller to prepare and stabilize the qubit to a target state with a mean fidelity around 85%.
arXiv Detail & Related papers (2021-01-04T19:00:03Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.