Rethinking Memory Design in SAM-Based Visual Object Tracking
- URL: http://arxiv.org/abs/2512.22624v1
- Date: Sat, 27 Dec 2025 15:33:50 GMT
- Title: Rethinking Memory Design in SAM-Based Visual Object Tracking
- Authors: Mohamad Alansari, Muzammal Naseer, Hasan Al Marzouqi, Naoufel Werghi, Sajid Javed,
- Abstract summary: We present a memory-centric study of SAM-based visual object tracking.<n>We propose a unified hybrid memory framework that explicitly decomposes memory into short-term appearance memory and long-term distractor-resolving memory.
- Score: 41.85403035673912
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: \noindent Memory has become the central mechanism enabling robust visual object tracking in modern segmentation-based frameworks. Recent methods built upon Segment Anything Model 2 (SAM2) have demonstrated strong performance by refining how past observations are stored and reused. However, existing approaches address memory limitations in a method-specific manner, leaving the broader design principles of memory in SAM-based tracking poorly understood. Moreover, it remains unclear how these memory mechanisms transfer to stronger, next-generation foundation models such as Segment Anything Model 3 (SAM3). In this work, we present a systematic memory-centric study of SAM-based visual object tracking. We first analyze representative SAM2-based trackers and show that most methods primarily differ in how short-term memory frames are selected, while sharing a common object-centric representation. Building on this insight, we faithfully reimplement these memory mechanisms within the SAM3 framework and conduct large-scale evaluations across ten diverse benchmarks, enabling a controlled analysis of memory design independent of backbone strength. Guided by our empirical findings, we propose a unified hybrid memory framework that explicitly decomposes memory into short-term appearance memory and long-term distractor-resolving memory. This decomposition enables the integration of existing memory policies in a modular and principled manner. Extensive experiments demonstrate that the proposed framework consistently improves robustness under long-term occlusion, complex motion, and distractor-heavy scenarios on both SAM2 and SAM3 backbones. Code is available at: https://github.com/HamadYA/SAM3_Tracking_Zoo. \textbf{This is a preprint. Some results are being finalized and may be updated in a future revision.}
Related papers
- RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies [54.23445842621374]
Memory is critical for long-horizon and history-dependent robotic manipulation.<n>Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms.<n>We introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models.
arXiv Detail & Related papers (2026-03-04T21:59:32Z) - Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management [63.48041801851891]
Fine-Mem is a unified framework designed for fine-grained feedback alignment.<n> Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-13T11:06:17Z) - Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning [55.251697395358285]
Large language models (LLMs) are increasingly deployed as intelligent agents that reason, plan, and interact with their environments.<n>To effectively scale to long-horizon scenarios, a key capability for such agents is a memory mechanism that can retain, organize, and retrieve past experiences.<n>We propose CompassMem, an event-centric memory framework inspired by Event Theory.
arXiv Detail & Related papers (2026-01-08T08:44:07Z) - Memory in the Age of AI Agents [217.9368190980982]
This work aims to provide an up-to-date landscape of current agent memory research.<n>We identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory.<n>To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks.
arXiv Detail & Related papers (2025-12-15T17:22:34Z) - Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution [52.76038908826961]
We propose $textbfReMe$ ($textitRemember Me, Refine Me$) to bridge the gap between static storage and dynamic reasoning.<n>ReMe innovates across the memory lifecycle via three mechanisms: $textitmulti-faceted distillation$, which extracts fine-grained experiences.<n>Experiments on BFCL-V3 and AppWorld demonstrate that ReMe establishes a new state-of-the-art in agent memory system.
arXiv Detail & Related papers (2025-12-11T14:40:01Z) - Multiple Memory Systems for Enhancing the Long-term Memory of Agent [9.43633399280987]
Existing methods, such as MemoryBank and A-MEM, have poor quality of stored memory content.<n>We have designed a multiple memory system inspired by cognitive psychology theory.
arXiv Detail & Related papers (2025-08-21T06:29:42Z) - MemoryKT: An Integrative Memory-and-Forgetting Method for Knowledge Tracing [7.096160553754792]
Simulating students' memory states is a promising approach to enhance both the performance and interpretability of knowledge tracing models.<n>Memory consists of three fundamental processes: encoding, storage, and retrieval.<n>This paper proposes memoryKT, a knowledge tracing model based on a novel temporal variational autoencoder.
arXiv Detail & Related papers (2025-08-11T15:59:59Z) - SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2 [2.659882635924329]
Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks.<n>Recent methods augment SAM 2 with hand-crafted update rules to better handle distractors and object motion.<n>We propose a fundamentally different approach using reinforcement learning for optimizing memory updates in SAM 2.
arXiv Detail & Related papers (2025-07-11T12:53:19Z) - MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection [21.22536962888316]
We present MoSAM, incorporating two key strategies to integrate object motion cues into the model and establish more reliable feature memory.<n>MoSAM achieves state-of-the-art results compared to other competitors.
arXiv Detail & Related papers (2025-04-30T02:19:31Z) - A Distractor-Aware Memory for Visual Object Tracking with SAM2 [11.864619292028278]
Memory-based trackers are video object segmentation methods that form the target model by concatenating recently tracked frames into a memory buffer and localize the target by attending the current image to the buffered frames.<n> SAM2.1++ outperforms SAM2.1 and related SAM memory extensions on seven benchmarks and sets a solid new state-of-the-art on six of them.
arXiv Detail & Related papers (2024-11-26T16:41:09Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - MeMOT: Multi-Object Tracking with Memory [97.48960039220823]
Our model, called MeMOT, consists of three main modules that are all Transformer-based.
MeMOT observes very competitive performance on widely adopted MOT datasets.
arXiv Detail & Related papers (2022-03-31T02:33:20Z) - Memory-Based Semantic Parsing [79.48882899104997]
We present a memory-based model for context-dependent semantic parsing.
We learn a context memory controller that manages the memory by maintaining the cumulative meaning of sequential user utterances.
arXiv Detail & Related papers (2021-09-07T16:15:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.