Related papers: RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers

RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers

URL: http://arxiv.org/abs/2403.18276v2
Date: Sun, 7 Apr 2024 06:44:28 GMT
Title: RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Authors: Zhichao Xu,
Abstract summary: Transformer architecture's core mechanism -- attention requires $O(n2)$ time complexity in training and $O(n)$ time complexity in inference. A notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in sequence modeling tasks. We find that Mamba models achieve competitive performance compared to transformer-based models with the same training recipe.
Score: 2.8554857235549753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks. In this work, we examine \mamba's efficacy through the lens of a classical IR task -- document ranking. A reranker model takes a query and a document as input, and predicts a scalar relevance score. This task demands the language model's ability to comprehend lengthy contextual inputs and to capture the interaction between query and document tokens. We find that (1) Mamba models achieve competitive performance compared to transformer-based models with the same training recipe; (2) but also have a lower training throughput in comparison to efficient transformer implementations such as flash attention. We hope this study can serve as a starting point to explore Mamba models in other classical IR tasks. Our code implementation and trained checkpoints are made public to facilitate reproducibility (https://github.com/zhichaoxu-shufe/RankMamba).

Related papers

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba. Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks. For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
State Space Models are Strong Text Rerankers [33.41687512973575]
State space models (SSMs) like Mamba offer promising advantages. Despite their potential, SSMs' effectiveness at text reranking remains underexplored. Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size.
arXiv Detail & Related papers (2024-12-18T21:42:15Z)
Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula [23.071384759427072]
State space models (SSMs) offer advantages over Transformers but struggle with tasks requiring long-range in-context retrieval-like text copying, associative recall, and question answering over long contexts. We propose a novel training procedure, Birdie, that significantly enhances the in-context retrieval capabilities of SSMs without altering their architecture.
arXiv Detail & Related papers (2024-11-01T21:01:13Z)
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures [49.24097977047392]
We investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity. We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features from these models and show that most features are similar in these two models.
arXiv Detail & Related papers (2024-10-09T08:28:53Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [73.80247057590519]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers, to achieve a balance between efficiency and performance in mobile applications. Our model achieves 83.0%/84.1% top-1 with only 12M/21M parameters on ImageNet-1K.
arXiv Detail & Related papers (2024-08-07T11:33:46Z)
An Empirical Study of Mamba-based Pedestrian Attribute Recognition [15.752464463535178]
This paper designs and adapts Mamba into two typical PAR frameworks, text-image fusion approach and pure vision Mamba multi-label recognition framework. It is found that interacting with attribute tags as additional input does not always lead to an improvement, specifically, Vim can be enhanced, but VMamba cannot. These experimental results indicate that simply enhancing Mamba with a Transformer does not always lead to performance improvements but yields better results under certain settings.
arXiv Detail & Related papers (2024-07-15T00:48:06Z)
How Effective are State Space Models for Machine Translation? [19.509486069758495]
Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers. It remains unclear whether these models are competitive with transformers in machine translation.
arXiv Detail & Related papers (2024-07-07T20:21:49Z)
Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z)
Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity. We show that Mamba shares surprising similarities with linear attention Transformer. We propose a Mamba-Like Linear Attention (MLLA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z)
The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains. We show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z)
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks [25.092302463435523]
State-space models (SSMs) have been proposed as alternatives to Transformer networks in language modeling. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks.
arXiv Detail & Related papers (2024-02-06T18:56:35Z)
Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL) This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM) ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm. Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.