Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
- URL: http://arxiv.org/abs/2507.19754v1
- Date: Sat, 26 Jul 2025 02:52:41 GMT
- Title: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
- Authors: Seunghun Lee, Jiwan Seo, Minwoo Choi, Kiljoon Han, Jaehoon Jeong, Zane Durante, Ehsan Adeli, Sang Hyun Park, Sunghoon Im,
- Abstract summary: We present Latest Object Memory Management (LOMM) for temporally consistent video instance segmentation.<n>LOMM robustly tracks and updates the latest states of objects by explicitly modeling their presence in each frame.<n>We introduce Decoupled Object Association (DOA), a strategy that separately handles newly appearing and already existing objects.
- Score: 17.805350735895114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present Latest Object Memory Management (LOMM) for temporally consistent video instance segmentation that significantly improves long-term instance tracking. At the core of our method is Latest Object Memory (LOM), which robustly tracks and continuously updates the latest states of objects by explicitly modeling their presence in each frame. This enables consistent tracking and accurate identity management across frames, enhancing both performance and reliability through the VIS process. Moreover, we introduce Decoupled Object Association (DOA), a strategy that separately handles newly appearing and already existing objects. By leveraging our memory system, DOA accurately assigns object indices, improving matching accuracy and ensuring stable identity consistency, even in dynamic scenes where objects frequently appear and disappear. Extensive experiments and ablation studies demonstrate the superiority of our method over traditional approaches, setting a new benchmark in VIS. Notably, our LOMM achieves state-of-the-art AP score of 54.0 on YouTube-VIS 2022, a dataset known for its challenging long videos. Project page: https://seung-hun-lee.github.io/projects/LOMM/
Related papers
- HQ-SMem: Video Segmentation and Tracking Using Memory Efficient Object Embedding With Selective Update and Self-Supervised Distillation Feedback [0.0]
We introduce HQ-SMem, for High Quality video segmentation and tracking using Smart Memory.<n>Our approach incorporates three key innovations: (i) leveraging SAM with High-Quality masks (SAM-HQ) alongside appearance-based candidate-selection to refine coarse segmentation masks, resulting in improved object boundaries; (ii) implementing a dynamic smart memory mechanism that selectively stores relevant key frames while discarding redundant ones; and (iii) dynamically updating the appearance model to effectively handle complex topological object variations and reduce drift throughout the video.
arXiv Detail & Related papers (2025-07-25T03:28:05Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - Spatial-Temporal Multi-level Association for Video Object Segmentation [89.32226483171047]
This paper proposes spatial-temporal multi-level association, which jointly associates reference frame, test frame, and object features.
Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features.
arXiv Detail & Related papers (2024-04-09T12:44:34Z) - Video Object Segmentation with Dynamic Query Modulation [23.811776213359625]
We propose a query modulation method, termed QMVOS, for object and multi-object segmentation.
Our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks.
arXiv Detail & Related papers (2024-03-18T07:31:39Z) - Look Before You Match: Instance Understanding Matters in Video Object
Segmentation [114.57723592870097]
In this paper, we argue that instance matters in video object segmentation (VOS)
We present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
We employ well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-auged matching is further performed.
arXiv Detail & Related papers (2022-12-13T18:59:59Z) - BURST: A Benchmark for Unifying Object Recognition, Segmentation and
Tracking in Video [58.71785546245467]
Multiple existing benchmarks involve tracking and segmenting objects in video.
There is little interaction between them due to the use of disparate benchmark datasets and metrics.
We propose BURST, a dataset which contains thousands of diverse videos with high-quality object masks.
All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison.
arXiv Detail & Related papers (2022-09-25T01:27:35Z) - MeMOT: Multi-Object Tracking with Memory [97.48960039220823]
Our model, called MeMOT, consists of three main modules that are all Transformer-based.
MeMOT observes very competitive performance on widely adopted MOT datasets.
arXiv Detail & Related papers (2022-03-31T02:33:20Z) - Multi-Object Tracking and Segmentation with a Space-Time Memory Network [12.043574473965318]
We propose a method for multi-object tracking and segmentation based on a novel memory-based mechanism to associate tracklets.
The proposed tracker, MeNToS, addresses particularly the long-term data association problem.
arXiv Detail & Related papers (2021-10-21T17:13:17Z) - Learning Position and Target Consistency for Memory-based Video Object
Segmentation [39.787966275016906]
Learn position and target consistency framework for memory-based video object segmentation.
It applies the memory mechanism to retrieve pixels globally, and meanwhile learns position consistency for more reliable segmentation.
Experiments show that our LCM achieves state-of-the-art performance on both DAVIS and Youtube-VOS benchmark.
arXiv Detail & Related papers (2021-04-09T12:22:37Z) - Fast Video Object Segmentation With Temporal Aggregation Network and
Dynamic Template Matching [67.02962970820505]
We introduce "tracking-by-detection" into Video Object (VOS)
We propose a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.
arXiv Detail & Related papers (2020-07-11T05:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.