FuguReport

A Parametric Memory Head for Continual Generative Retrieval

Authors Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke
Affiliations University of Amsterdam
Categories Method / Memory Networks / Parametric memory head module, Application / Generative Retrieval / Document identifier decoding, Evaluation / Continual Learning / Performance in sequential document addition
License CC BY 4.0

Abstract Overview

This paper addresses continual adaptation for generative information retrieval (GenIR), where a single encoder-decoder model retrieves documents by generating document identifiers directly from queries. The authors demonstrate that when new document slices are added sequentially, standard adaptation methods (full fine-tuning and LoRA) improve retrieval on the newest slice but substantially degrade performance on earlier slices, revealing a pronounced stability-plasticity trade-off. To mitigate this, they propose post-adaptation memory tuning (PAMT), a two-stage framework that first adapts the backbone on new documents, then freezes it and applies a memory-only stabilization stage using a modular parametric memory head (PMH). The PMH, implemented as a product-key memory, produces sparse hidden-space corrections during prefix-trie constrained decoding, with Stage 2 updating only a fixed budget of memory value rows selected via access-frequency and inverse-historical-frequency statistics while keeping the backbone and routing components frozen.

Novelty

The paper introduces an adapt-then-stabilize framework for continual GenIR that decouples learning new document slices (Stage 1) from a post-adaptation memory-only stabilization stage (Stage 2). Its parametric memory head is a decoder-side product-key memory that recalibrates only trie-valid token scores via value-only sparse updates, with updatable rows selected by current-session access frequency and inverse historical frequency rather than through further backbone optimization.

Results

Across MS MARCO and Natural Questions under the Expanded protocol, PAMT consistently improves retention on earlier slices while largely preserving new-slice performance: signed backward transfer improves from -55.72 to -22.64 for Full FT-SPQ on MS MARCO and from -25.93 to -9.16 for Full FT-TU on NQ. Control experiments confirm that search-space growth alone causes only minor retention drift (e.g., ≤2.95 pp Hit@10 drop on MS MARCO), indicating that update-induced interference is the dominant source of degradation. PAMT narrows the stability gap relative to prior continual GenIR baselines such as MixLoRA-DSI, though index-based retrievers remain more stable overall.

Key Points

  1. Sequential adaptation in GenIR improves retrieval on newly added documents but causes severe forgetting on previously indexed slices (e.g., BWT± of -55.72 for Full FT-SPQ on MS MARCO), a pattern consistent across full fine-tuning and LoRA settings as well as both semantic product-quantized and keyword-based identifier schemes.
  2. PAMT freezes the adapted backbone and routing components after Stage 1, then updates only a sparse budget of PMH value rows (e.g., 10,000 rows out of 160,000) using access-frequency × inverse-historical-frequency selection to recalibrate docid decoding without replaying legacy supervision.
  3. Keyword-based title+URL identifiers are more stable than semantic product-quantized identifiers in continual settings, and PAMT achieves the best stability-plasticity trade-off among continual GenIR methods tested (e.g., BWT± of -9.16 for PAMT-Full FT-TU on NQ), though index-based retrievers such as DPR-HN remain more stable overall.

References

This page was created using generative AI such as GPT-5, Claude Opus 4, Gemini 3, Gemini 3.1 Flash Image, and their higher-end successor versions. No guarantee can be made regarding its contents.