ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization
- URL: http://arxiv.org/abs/2603.02939v1
- Date: Tue, 03 Mar 2026 12:48:40 GMT
- Title: ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization
- Authors: Yang Zhan, Yunhao Li, Zhang Chao, Yuxu Lu, Yan Li,
- Abstract summary: We propose a novel framework that reformulates ship trajectory prediction as a text-to-text generation problem.<n>The proposed ShipTraj-R1 achieves the least error compared with state-of-the-art deep learning and LLM-based baselines.
- Score: 13.420880035877252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in reinforcement fine-tuning have significantly improved the reasoning ability of large language models (LLMs). In particular, methods such as group relative policy optimization (GRPO) have demonstrated strong capabilities across various fields. However, applying LLMs to ship trajectory prediction remains largely unexplored. In this paper, we propose ShipTraj-R1, a novel LLM-based framework that reformulates ship trajectory prediction as a text-to-text generation problem. (1) We design a dynamic prompt containing trajectory information about conflicting ships to guide the model to achieve adaptive chain-of-thought (CoT) reasoning. (2) We introduce a comprehensive rule-based reward mechanism to incentivize the reasoning format and prediction accuracy of the model. (3) Our ShipTraj-R1 is reinforced through the GRPO mechanism guided by domain-specific prompts and rewards, and utilizes the Qwen3 as the model backbone. Extensive experimental results on two complex and real-world maritime datasets show that the proposed ShipTraj-R1 achieves the least error compared with state-of-the-art deep learning and LLM-based baselines.
Related papers
- TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - How to Set the Learning Rate for Large-Scale Pre-training? [73.03133634525635]
We formalize this investigation into two distinct research paradigms: Fitting and Transfer.<n>Within the Fitting Paradigm, we introduce a Scaling Law for search factor, effectively reducing the search complexity from O(n3) to O(n*C_D*C_) via predictive modeling.<n>We extend the principles of $$Transfer to the Mixture of Experts (MoE) architecture, broadening its applicability to encompass model depth, weight decay, and token horizons.
arXiv Detail & Related papers (2026-01-08T15:55:13Z) - PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models [51.43746425777865]
Large Language Models (LLMs) often lack the capacity to formulate global strategies, leading to error propagation in long-horizon tasks.<n>We propose PILOT, a framework designed to internalize the strategic oversight of large models into intrinsic Latent Guidance.
arXiv Detail & Related papers (2026-01-07T12:38:56Z) - On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting [91.38734024438357]
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs)<n>Existing approaches that integrate SFT and RL often face the risk of disrupting established response patterns and inducing overfitting to expert data.<n>We propose CHORD, a framework for Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting.
arXiv Detail & Related papers (2025-08-15T11:20:03Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning [77.34267241692706]
Vision-Language Navigation (VLN) is a core challenge in embodied AI, requiring agents to navigate real-world environments using natural language instructions.<n>We propose VLN-R1, an end-to-end framework that leverages Large Vision-Language Models (LVLM) to directly translate egocentric video streams into continuous navigation actions.
arXiv Detail & Related papers (2025-06-20T17:59:59Z) - Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws [52.10468229008941]
This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting.<n>We provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model.<n>Building on these insights, we introduce a novel method for Contrastive Language-Image Pretraining with a reference model, termed DRRho-CLIP.
arXiv Detail & Related papers (2025-05-10T16:55:03Z) - FLP-XR: Future Location Prediction on Extreme Scale Maritime Data in Real-time [0.8937169040399775]
This paper introduces FLP-XR, a model that leverages maritime mobility data to construct a robust framework that offers precise predictions.<n>We demonstrate the efficiency of our approach through an extensive experimental study using three real-world AIS datasets.
arXiv Detail & Related papers (2025-03-10T13:31:42Z) - Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications [6.990141986853289]
Existing navigation decision support systems often perform poorly when handling non-predefined scenarios.<n>This research proposes a dual-core framework for LLM applications to address this issue.
arXiv Detail & Related papers (2025-02-23T01:41:58Z) - AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO [0.0]
Large Language Models (LLMs) have demonstrated impressive capabilities in language processing, yet they often struggle with tasks requiring visual spatial reasoning.<n>We introduce a novel two-stage training framework designed to equip standard LLMs with visual reasoning abilities for maze navigation.
arXiv Detail & Related papers (2025-02-20T16:05:18Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning [97.88246428240872]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.<n>Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.<n>This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.