Comba: Improving Bilinear RNNs with Closed-loop Control
- URL: http://arxiv.org/abs/2506.02475v3
- Date: Sat, 21 Jun 2025 08:52:34 GMT
- Title: Comba: Improving Bilinear RNNs with Closed-loop Control
- Authors: Jiaxi Hu, Yongqi Pan, Jusen Du, Disen Lan, Xiaqiang Tang, Qingsong Wen, Yuxuan Liang, Weigao Sun,
- Abstract summary: We introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models.<n>We propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections.<n>We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus.
- Score: 19.761486052705017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.
Related papers
- Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z) - MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z) - DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products [63.66021758150632]
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling.<n>Existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices.<n>We introduce DeltaProduct, which takes multiple ($n_h$) steps per token and achieves superior state-tracking and language modeling capabilities.
arXiv Detail & Related papers (2025-02-14T16:59:05Z) - Bilinear Convolution Decomposition for Causal RL Interpretability [0.0]
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing.<n>This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed.<n>We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments.
arXiv Detail & Related papers (2024-12-01T19:32:04Z) - Behavior-Dependent Linear Recurrent Units for Efficient Sequential Recommendation [18.75561256311228]
RecBLR is an Efficient Sequential Recommendation Model based on Behavior-Dependent Linear Recurrent Units.
Our model significantly enhances user behavior modeling and recommendation performance.
arXiv Detail & Related papers (2024-06-18T13:06:58Z) - Parallelizing Linear Transformers with the Delta Rule over Sequence Length [49.88826673324244]
This work describes a hardware-efficient algorithm for training linear transformers with the delta rule.<n>We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines.
arXiv Detail & Related papers (2024-06-10T17:24:42Z) - Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures [14.551812310439004]
We introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance.
Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass.
arXiv Detail & Related papers (2024-03-07T19:02:13Z) - Data-driven Nonlinear Model Reduction using Koopman Theory: Integrated
Control Form and NMPC Case Study [56.283944756315066]
We propose generic model structures combining delay-coordinate encoding of measurements and full-state decoding to integrate reduced Koopman modeling and state estimation.
A case study demonstrates that our approach provides accurate control models and enables real-time capable nonlinear model predictive control of a high-purity cryogenic distillation column.
arXiv Detail & Related papers (2024-01-09T11:54:54Z) - TransNormerLLM: A Faster and Better Large Language Model with Improved
TransNormer [34.790081960470964]
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM)
We make advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.
We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
arXiv Detail & Related papers (2023-07-27T16:45:33Z) - Comparative analysis of machine learning methods for active flow control [60.53767050487434]
Genetic Programming (GP) and Reinforcement Learning (RL) are gaining popularity in flow control.
This work presents a comparative analysis of the two, bench-marking some of their most representative algorithms against global optimization techniques.
arXiv Detail & Related papers (2022-02-23T18:11:19Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Nonlinear State-Space Generalizations of Graph Convolutional Neural
Networks [172.18295279061607]
Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities.
In this work, we approach GCNNs from a state-space perspective revealing that the graph convolutional module is a minimalistic linear state-space model.
We show that this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish.
We propose a novel family of nodal aggregation rules that aggregate node features within a layer in a nonlinear state-space parametric fashion allowing for a better trade-off.
arXiv Detail & Related papers (2020-10-27T19:48:56Z) - State space models for building control: how deep should you go? [3.1171750528972204]
This work systematically investigates whether using RNNs for building control provides net gains in an MPC framework.
The error on the one-hour forecast of temperature is 69% lower with the RNN model than with the linear one.
In control the linear state-space model outperforms by 10% on the objective function, shows 2.8 times higher average temperature violations, and needs a third of the time the RNN model requires.
arXiv Detail & Related papers (2020-10-23T09:38:43Z) - Revisiting Graph based Collaborative Filtering: A Linear Residual Graph
Convolutional Network Approach [55.44107800525776]
Graph Convolutional Networks (GCNs) are state-of-the-art graph based representation learning models.
In this paper, we revisit GCN based Collaborative Filtering (CF) based Recommender Systems (RS)
We show that removing non-linearities would enhance recommendation performance, consistent with the theories in simple graph convolutional networks.
We propose a residual network structure that is specifically designed for CF with user-item interaction modeling.
arXiv Detail & Related papers (2020-01-28T04:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.