RSPT: Reconstruct Surroundings and Predict Trajectories for
Generalizable Active Object Tracking
- URL: http://arxiv.org/abs/2304.03623v1
- Date: Fri, 7 Apr 2023 12:52:24 GMT
- Title: RSPT: Reconstruct Surroundings and Predict Trajectories for
Generalizable Active Object Tracking
- Authors: Fangwei Zhong, Xiao Bi, Yudi Zhang, Wei Zhang, Yizhou Wang
- Abstract summary: We present RSPT, a framework that forms a structure-aware motion representation by Reconstructing the Surroundings and Predicting the target Trajectory.
We evaluate RSPT on various simulated scenarios and show that it outperforms existing methods in unseen environments.
- Score: 17.659697426459083
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Active Object Tracking (AOT) aims to maintain a specific relation between the
tracker and object(s) by autonomously controlling the motion system of a
tracker given observations. AOT has wide-ranging applications, such as in
mobile robots and autonomous driving. However, building a generalizable active
tracker that works robustly across different scenarios remains a challenge,
especially in unstructured environments with cluttered obstacles and diverse
layouts. We argue that constructing a state representation capable of modeling
the geometry structure of the surroundings and the dynamics of the target is
crucial for achieving this goal. To address this challenge, we present RSPT, a
framework that forms a structure-aware motion representation by Reconstructing
the Surroundings and Predicting the target Trajectory. Additionally, we enhance
the generalization of the policy network by training in an asymmetric dueling
mechanism. We evaluate RSPT on various simulated scenarios and show that it
outperforms existing methods in unseen environments, particularly those with
complex obstacles and layouts. We also demonstrate the successful transfer of
RSPT to real-world settings. Project Website:
https://sites.google.com/view/aot-rspt.
Related papers
- Tracking Transforming Objects: A Benchmark [2.53045657890708]
This study collects a novel dedicated dataset for Tracking Transforming Objects, called DTTO, which contains 100 sequences, amounting to approximately 9.3K frames.
We provide carefully hand-annotated bounding boxes for each frame within these sequences, making DTTO the pioneering benchmark dedicated to tracking transforming objects.
We thoroughly evaluate 20 state-of-the-art trackers on the benchmark, aiming to comprehend the performance of existing methods and provide a comparison for future research on DTTO.
arXiv Detail & Related papers (2024-04-28T11:24:32Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment.
We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance.
We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z) - Robot Navigation in Constrained Pedestrian Environments using
Reinforcement Learning [32.454250811667904]
Navigating fluently around pedestrians is a necessary capability for mobile robots deployed in human environments.
We present an approach based on reinforcement learning to learn policies capable of dynamic adaptation to the presence of moving pedestrians.
We show transfer of the learned policy to unseen 3D reconstructions of two real environments.
arXiv Detail & Related papers (2020-10-16T19:40:08Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Latent Space Roadmap for Visual Action Planning of Deformable and Rigid
Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images.
Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.