Related papers: DREAMSTATE: Diffusing States and Parameters for Recurrent Large Language Models

DREAMSTATE: Diffusing States and Parameters for Recurrent Large Language Models

URL: http://arxiv.org/abs/2601.19221v1
Date: Tue, 27 Jan 2026 05:42:25 GMT
Title: DREAMSTATE: Diffusing States and Parameters for Recurrent Large Language Models
Authors: Liu Xiao,
Abstract summary: Recurrent Neural Networks (RNNs) are distinguished by their powerful short-range modeling capabilities and efficient fixed-size states.<n>However, there is a significant lack of research into their internal state as an editable knowledge representation.<n>We first explore the representational properties of the RWKV state by proposing the DREAMSTATE framework.<n>We propose a novel hybrid architecture that combines the local advantages of RNNs with global context adaptability.
Score: 0.7364191922317778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern Recurrent Neural Networks (RNNs), such as RWKV, are distinguished by their powerful short-range modeling capabilities and efficient fixed-size states, which constitute a core advantage over standard Transformers. However, there is a significant lack of research into their internal state as an editable knowledge representation. To fill this gap, we first explore the representational properties of the RWKV state by proposing the DREAMSTATE framework. This framework utilizes a conditional Diffusion Transformer (DiT) to directly model the probability manifold of the state, enabling its generation and editing. The structural nature of this representation is validated through t-SNE visualizations and controlled generation experiments. After successfully uncovering and modeling the state's representational potential, we further propose a novel hybrid architecture that combines the local advantages of RNNs with global context adaptability. This architecture features a parallel DiT that processes a variable-length global context to dynamically generate and adjust the core recurrent module's WKV parameters, transforming the fixed recurrence mechanism into a context-aware dynamic function. Experiments demonstrate that this hybrid model can be trained stably via a multi-objective loss, validating its design feasibility. Our work not only opens a new research direction for RNN state representation but also provides a concrete architectural reference for future model design. The code is publicly available at: https://huggingface.co/2dgx41s/DreamState.

Related papers

H-Model: Dynamic Neural Architectures for Adaptive Processing [0.0]
This article explores the design and experimentation of a neural network architecture capable of dynamically adjusting its internal structure based on the input data.<n>The proposed model introduces a routing mechanism that allows each layer to influence how its outputs are propagated through the network.
arXiv Detail & Related papers (2025-11-11T14:39:42Z)
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows [46.673228292287895]
We propose a novel framework that employs transformer-based autoregressive normalizing flows to model continuous representations.<n>This approach unlocks substantial flexibility, enabling the construction of models that can capture global bi-directional context.<n>We propose new mixture-based coupling transformations designed to capture complex dependencies within the latent space shaped by discrete data.
arXiv Detail & Related papers (2025-07-01T04:51:25Z)
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z)
VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z)
Universal In-Context Approximation By Prompting Fully Recurrent Models [86.61942787684272]
We show that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures can serve as universal in-context approximators. We introduce a programming language called LSRL that compiles to fully recurrent architectures.
arXiv Detail & Related papers (2024-06-03T15:25:13Z)
Does Transformer Interpretability Transfer to RNNs? [0.6437284704257459]
Recent advances in recurrent neural network architectures have enabled RNNs to match or exceed the performance of equal-size transformers. We show that it is possible to improve some of these techniques by taking advantage of RNNs' compressed state.
arXiv Detail & Related papers (2024-04-09T02:59:17Z)
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts [88.23732496104667]
Cross-scene generalizable NeRF models have become a new spotlight of the NeRF field. We bridge "neuralized" architectures with the powerful Mixture-of-Experts (MoE) idea from large language models. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes.
arXiv Detail & Related papers (2023-08-22T21:18:54Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Understanding Dynamics of Nonlinear Representation Learning and Its Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning. We show that the data-architecture alignment condition is sufficient for the global convergence. We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z)
Decoupling Global and Local Representations via Invertible Generative Flows [47.366299240738094]
Experimental results on standard image benchmarks demonstrate the effectiveness of our model in terms of density estimation, image generation and unsupervised representation learning. This work demonstrates that a generative model with a likelihood-based objective is capable of learning decoupled representations, requiring no explicit supervision.
arXiv Detail & Related papers (2020-04-12T03:18:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.