Related papers: Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

URL: http://arxiv.org/abs/2510.12026v2
Date: Wed, 15 Oct 2025 01:49:35 GMT
Title: Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Authors: Junsoo Oh, Wei Huang, Taiji Suzuki,
Abstract summary: Mamba is a proposed linear-time sequence model with strong empirical performance.<n>We study in-context learning of a single-index model $y approx g_*(langle boldsymbolbeta, boldsymbolx rangle)$.<n>We prove that Mamba, pretrained by gradient-based methods, can achieve efficient ICL via test-time feature learning.
Score: 53.983686308399676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mamba, a recently proposed linear-time sequence model, has attracted significant attention for its computational efficiency and strong empirical performance. However, a rigorous theoretical understanding of its underlying mechanisms remains limited. In this work, we provide a theoretical analysis of Mamba's in-context learning (ICL) capability by focusing on tasks defined by low-dimensional nonlinear target functions. Specifically, we study in-context learning of a single-index model $y \approx g_*(\langle \boldsymbol{\beta}, \boldsymbol{x} \rangle)$, which depends on only a single relevant direction $\boldsymbol{\beta}$, referred to as feature. We prove that Mamba, pretrained by gradient-based methods, can achieve efficient ICL via test-time feature learning, extracting the relevant direction directly from context examples. Consequently, we establish a test-time sample complexity that improves upon linear Transformers -- analyzed to behave like kernel methods -- and is comparable to nonlinear Transformers, which have been shown to surpass the Correlational Statistical Query (CSQ) lower bound and achieve near information-theoretically optimal rate in previous works. Our analysis reveals the crucial role of the nonlinear gating mechanism in Mamba for feature extraction, highlighting it as the fundamental driver behind Mamba's ability to achieve both computational efficiency and high performance.

Related papers

Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis [88.05636819649804]
The Mamba model has gained significant attention for its computational advantages over Transformer-based models.<n>This paper presents the first theoretical analysis of the training dynamics of a one-layer Mamba model.<n>We show that although Mamba may require more training to converge, it maintains accurate predictions even when the proportion of outliers exceeds the threshold that a linear Transformer can tolerate.
arXiv Detail & Related papers (2025-10-01T01:25:01Z)
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression [90.93281146423378]
Mamba is an efficient Transformer alternative with linear complexity for long-sequence modeling.<n>Recent empirical works demonstrate Mamba's in-context learning (ICL) competitive with Transformers.<n>This paper studies the training dynamics of Mamba on the linear regression ICL task.
arXiv Detail & Related papers (2025-09-28T09:48:49Z)
Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency [10.942999793311765]
We investigate in-context learning (ICL) through a meticulous experimental framework that systematically varies task complexity and model architecture.<n>We evaluate four distinct models: a GPT2-style Transformer, a Transformer with FlashAttention mechanism, a convolutional Hyena-based model, and the Mamba state-space model.
arXiv Detail & Related papers (2025-05-10T00:22:40Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis [37.18701051669003]
PoinTramba is a hybrid framework that combines the analytical power of Transformer with the remarkable computational efficiency of Mamba. Our approach first segments point clouds into groups, where the Transformer meticulously captures intricate intra-group dependencies. Unlike previous Mamba approaches, we introduce a bi-directional importance-aware ordering (BIO) strategy to tackle the challenges of random ordering effects.
arXiv Detail & Related papers (2024-05-24T11:36:26Z)
PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z)
Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL) This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.