Tracking and Understanding Object Transformations
- URL: http://arxiv.org/abs/2511.04678v1
- Date: Thu, 06 Nov 2025 18:59:30 GMT
- Title: Tracking and Understanding Object Transformations
- Authors: Yihong Sun, Xinyu Yang, Jennifer J. Sun, Bharath Hariharan,
- Abstract summary: We introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes.<n>We present TubeletGraph, a zero-shot system that recovers missing objects after transformation and maps out how object states are evolving over time.<n>TubeletGraph achieves deeper understanding of object transformations and promising capabilities in temporal grounding and semantic reasoning for complex object transformations.
- Score: 43.15129025464927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world objects frequently undergo state transformations. From an apple being cut into pieces to a butterfly emerging from its cocoon, tracking through these changes is important for understanding real-world objects and dynamics. However, existing methods often lose track of the target object after transformation, due to significant changes in object appearance. To address this limitation, we introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes, accompanied by a new benchmark dataset, VOST-TAS. To tackle this problem, we present TubeletGraph, a zero-shot system that recovers missing objects after transformation and maps out how object states are evolving over time. TubeletGraph first identifies potentially overlooked tracks, and determines whether they should be integrated based on semantic and proximity priors. Then, it reasons about the added tracks and generates a state graph describing each observed transformation. TubeletGraph achieves state-of-the-art tracking performance under transformations, while demonstrating deeper understanding of object transformations and promising capabilities in temporal grounding and semantic reasoning for complex object transformations. Code, additional results, and the benchmark dataset are available at https://tubelet-graph.github.io.
Related papers
- Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress [53.723881111373736]
We present SPARTA, the first unified framework for the family of object state change manipulation tasks.<n>SPARTA integrates spatially progressing object change segmentation maps, a visual skill to perceive actionable vs. transformed regions, and dense rewards that capture incremental progress over time.<n>We validate SPARTA on a real robot for three challenging tasks across 10 diverse real-world objects.
arXiv Detail & Related papers (2025-09-28T23:56:07Z) - SPOC: Spatially-Progressing Object State Change Segmentation in Video [52.65373395382122]
We introduce the spatially-progressing object state change segmentation task.<n>The goal is to segment at the pixel-level those regions of an object that are actionable and those that are transformed.<n>We demonstrate useful implications for tracking activity progress to benefit robotic agents.
arXiv Detail & Related papers (2025-03-15T01:48:54Z) - M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation [51.82272563578793]
We introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes.<n>We present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object (M$3$-VOS), to verify the ability of models to understand object phases.
arXiv Detail & Related papers (2024-12-18T12:50:11Z) - A Dataset and Framework for Learning State-invariant Object Representations [0.6577148087211809]
We present a novel dataset, ObjectsWithStateChange, which captures state and pose variations in the object images recorded from arbitrary viewpoints.<n>Our ablation related to the role played by curriculum learning indicates an improvement in object recognition accuracy of 7.9% and retrieval mAP of 9.2% over the state-of-the-art on our new dataset.
arXiv Detail & Related papers (2024-04-09T17:17:48Z) - Robust Change Detection Based on Neural Descriptor Fields [53.111397800478294]
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results.
By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises.
arXiv Detail & Related papers (2022-08-01T17:45:36Z) - Learning to Track with Object Permanence [61.36492084090744]
We introduce an end-to-end trainable approach for joint object detection and tracking.
Our model, trained jointly on synthetic and real data, outperforms the state of the art on KITTI, and MOT17 datasets.
arXiv Detail & Related papers (2021-03-26T04:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.