Related papers: ENTL: Embodied Navigation Trajectory Learner

ENTL: Embodied Navigation Trajectory Learner

URL: http://arxiv.org/abs/2304.02639v3
Date: Fri, 29 Sep 2023 15:11:03 GMT
Title: ENTL: Embodied Navigation Trajectory Learner
Authors: Klemen Kotar, Aaron Walsman, Roozbeh Mottaghi
Abstract summary: We propose a method for extracting long sequence representations for embodied navigation. We train our model using vector-quantized predictions of future states conditioned on current actions. A key property of our approach is that the model is pre-trained without any explicit reward signal.
Score: 37.43079415330256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose Embodied Navigation Trajectory Learner (ENTL), a method for extracting long sequence representations for embodied navigation. Our approach unifies world modeling, localization and imitation learning into a single sequence prediction task. We train our model using vector-quantized predictions of future states conditioned on current states and actions. ENTL's generic architecture enables sharing of the spatio-temporal sequence encoder for multiple challenging embodied tasks. We achieve competitive performance on navigation tasks using significantly less data than strong baselines while performing auxiliary tasks such as localization and future frame prediction (a proxy for world modeling). A key property of our approach is that the model is pre-trained without any explicit reward signal, which makes the resulting model generalizable to multiple tasks and environments.

Related papers

Streaming Real-Time Trajectory Prediction Using Endpoint-Aware Modeling [54.94692733670454]
Future trajectories of neighboring traffic agents have a significant influence on the path planning and decision-making of autonomous vehicles.<n>We propose a lightweight yet highly accurate streaming-based trajectory forecasting approach.<n>Our approach significantly reduces inference latency, making it well-suited for real-world deployment.
arXiv Detail & Related papers (2026-03-02T13:44:23Z)
Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z)
SITS-DECO: A Generative Decoder Is All You Need For Multitask Satellite Image Time Series Modelling [0.0]
We introduce SITS-DECO, a proof-of-concept generative model that applies unified-sequence framing to EO data.<n>We show that the model can perform multiple supervised and self-supervised tasks within a single unified architecture.<n>Despite its simplicity and lack of spatial context, SITS-DECO outperforms much larger EO foundation models on crop-type classification.
arXiv Detail & Related papers (2025-10-21T14:42:55Z)
ILNet: Trajectory Prediction with Inverse Learning Attention for Enhancing Intention Capture [4.190790144182306]
It is acknowledged that human drivers dynamically adjust initial driving decisions based on assumptions about the intentions surrounding vehicles.<n>Motivated by human driving behaviors, this paper proposes ILNet, a multi-agent trajectory prediction method with Inverse Learning (IL) attention and Dynamic Anchor SelectionDAS (DAS) module.<n> Experimental results show that the ILNet achieves state-of-the-art performance on the INTERACTION and Argoverse motion forecasting datasets.
arXiv Detail & Related papers (2025-07-09T04:18:01Z)
Universal Retrieval for Multimodal Trajectory Modeling [12.160448446091607]
Trajectory data holds significant potential for enhancing AI agent capabilities.<n>We introduce Multimodal Trajectory Retrieval, bridging the gap between universal retrieval and agent-centric trajectory modeling.
arXiv Detail & Related papers (2025-06-27T09:50:38Z)
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers [11.075247758198762]
This paper introduces FUTURIST, a method for multimodal future semantic prediction that uses a unified and efficient visual sequence transformer architecture. We propose a VAE-free hierarchical tokenization process, which reduces computational complexity, streamlines the training pipeline, and enables end-to-end training with high-resolution, multimodal inputs. We validate FUTURIST on the Cityscapes dataset, demonstrating state-of-the-art performance in future semantic segmentation for both short- and mid-term forecasting.
arXiv Detail & Related papers (2025-01-14T18:34:14Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
Towards Learning a Generalist Model for Embodied Navigation [24.816490551945435]
We propose the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction. We conduct extensive experiments to evaluate the performance and generalizability of our model.
arXiv Detail & Related papers (2023-12-04T16:32:51Z)
Interactive Semantic Map Representation for Skill-based Visual Object Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. We have implemented this representation into a full-fledged navigation approach called SkillTron. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z)
BEVBert: Multimodal Map Pre-training for Language-guided Navigation [75.23388288113817]
We propose a new map-based pre-training paradigm that is spatial-aware for use in vision-and-language navigation (VLN) We build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. Based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal.
arXiv Detail & Related papers (2022-12-08T16:27:54Z)
Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction [31.02081143697431]
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and video-surveillance applications. We propose a lightweight attention-based recurrent backbone that acts solely on past observed positions. We employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations.
arXiv Detail & Related papers (2022-04-25T11:12:37Z)
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings. Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z)
Waypoint Models for Instruction-guided Navigation in Continuous Environments [68.2912740006109]
We develop a class of language-conditioned waypoint prediction networks to examine this question. We measure task performance and estimated execution time on a profiled LoCoBot robot. Our models outperform prior work in VLN-CE and set a new state-of-the-art on the public leaderboard.
arXiv Detail & Related papers (2021-10-05T17:55:49Z)
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction [72.37440317774556]
We propose advances that address two key challenges in future trajectory prediction. multimodality in both training data and predictions and constant time inference regardless of number of agents.
arXiv Detail & Related papers (2020-07-26T08:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.