Related papers: The Surprising Soupability of Documents in State Space Models

The Surprising Soupability of Documents in State Space Models

URL: http://arxiv.org/abs/2505.24033v1
Date: Thu, 29 May 2025 22:13:21 GMT
Title: The Surprising Soupability of Documents in State Space Models
Authors: Yasaman Jafari, Zixian Wang, Leon Bergen, Taylor Berg-Kirkpatrick,
Abstract summary: Inspired by model souping, we propose a strategy where documents are encoded independently and their representations are pooled.<n>We finetune Mamba2 models to produce soupable representations and find that they support multi-hop QA, sparse retrieval, and long-document reasoning with strong accuracy.<n>On HotpotQA, souping ten independently encoded documents nearly matches the performance of a cross-encoder trained on the same inputs.
Score: 28.95633840848728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post-hoc to support downstream reasoning. Inspired by model souping, we propose a strategy where documents are encoded independently and their representations are pooled -- via simple operations like averaging -- into a single context state. This approach, which we call document souping, enables modular encoding and reuse without reprocessing the full input for each query. We finetune Mamba2 models to produce soupable representations and find that they support multi-hop QA, sparse retrieval, and long-document reasoning with strong accuracy. On HotpotQA, souping ten independently encoded documents nearly matches the performance of a cross-encoder trained on the same inputs.

Related papers

Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning [12.17399365931]
Existing one-pass MLLMs process entire document images without considering query relevance.<n>Inspired by the human coarse-to-fine reading pattern, we introduce Doc-CoB, a simple-yet-effective mechanism that integrates human-style visual reasoning into MLLM.<n>Our method allows the model to autonomously select the set of regions most relevant to the query, and then focus attention on them for further understanding.
arXiv Detail & Related papers (2025-05-24T08:53:05Z)
Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)<n>DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.<n> Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
Plug-and-Play Document Modules for Pre-trained Models [92.9897146991974]
We propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD) By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks. Experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios.
arXiv Detail & Related papers (2023-05-28T08:01:40Z)
RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models [12.37229805276939]
We propose a novel pre-training method called Duplex Masked Auto-Encoder, a.k.a. DupMAE. It is designed to improve the quality semantic representation where all contextualized embeddings of the pretrained model can be leveraged.
arXiv Detail & Related papers (2023-05-04T05:37:22Z)
Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval [79.37614949970013]
We propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
arXiv Detail & Related papers (2022-08-08T16:00:55Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Long Document Re-ranking with Modular Re-ranker [15.935423344245363]
Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. We propose to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework.
arXiv Detail & Related papers (2022-05-09T13:44:02Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.