Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2511.11402v1
- Date: Fri, 14 Nov 2025 15:29:46 GMT
- Title: Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning
- Authors: Amit Jain, Victor Rodriguez-Fernandez, Richard Linares,
- Abstract summary: This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture.<n>Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes.
- Score: 2.034091340570242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.
Related papers
- Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z) - DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations [9.210347753567092]
DCoPilot is a hybrid framework for generative control policies in dynamic DC operation.<n>It operates through three coordinated phases: (i) simulation scale-up, which stress-tests reward candidates across diverse simulation-ready scenes; (ii) meta policy distillation, where a hypernetwork is trained to output policy weights conditioned on SLA and scene embeddings; and (iii) online adaptation, enabling zero-shot policy generation in response to updated specifications.
arXiv Detail & Related papers (2026-02-02T14:18:52Z) - Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces [3.719121868494767]
We show that a single encoder configuration can yield near-zero near mid-air collision rates and shorter loss-of-separation infringements than the deeper configurations.<n>Our results suggest that the newly formulated state representation, novel design of neural network architecture, and proposed training strategy provide an adaptable and scalable decentralized solution for aircraft separation assurance.
arXiv Detail & Related papers (2026-01-07T21:18:28Z) - QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management [52.15690855486153]
A space-air-ground integrated network (SAGIN) has emerged as an essential architecture for enabling ubiquitous UAV connectivity.<n>This paper formulates UAV mobility management in SAGIN as a constrained multiobjective joint optimization problem.
arXiv Detail & Related papers (2025-12-17T06:22:46Z) - Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning [56.47948583452555]
We introduce the Stepwise Flow Policy (SWFP) framework, founded on the key insight that discretizing the flow matching inference process via a fixed-step Euler scheme aligns it with the variational Jordan-Kinderlehrer-Otto principle from optimal transport.<n>SWFP decomposes the global flow into a sequence of small, incremental transformations between proximate distributions.<n>This decomposition yields an efficient algorithm that fine-tunes pre-trained flows via a cascade of small flow blocks, offering significant advantages.
arXiv Detail & Related papers (2025-10-17T07:43:51Z) - Multi-Agent Path Finding via Offline RL and LLM Collaboration [0.0]
Multi-Agent Path Finding (MAPF) poses a significant and challenging problem for applications in robotics and logistics.<n>We propose an efficient decentralized planning framework based on the Decision Transformer (DT)<n>Our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards.
arXiv Detail & Related papers (2025-09-26T09:53:40Z) - DyTTP: Trajectory Prediction with Normalization-Free Transformers [0.0]
Transformer-based architectures have demonstrated significant promise in capturing complex robustnessity dependencies.<n>We present a two-fold approach to address these challenges.<n>First, we integrate DynamicTanh (DyT), which is the latest method to promote transformers, into the backbone, replacing traditional layer normalization.<n>We are the first work to deploy the DyT to the trajectory prediction task.
arXiv Detail & Related papers (2025-04-07T09:26:25Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [73.75271615101754]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.<n>Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.<n>Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z) - A Coalition Game for On-demand Multi-modal 3D Automated Delivery System [4.378407481656902]
We introduce a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments.<n>We investigate cooperation structures among the modes to capture how strategic collaboration can improve overall routing efficiency.<n>Several numerical experiments on last-mile delivery applications have been conducted, showing the results from the case study in the city of Mississauga.
arXiv Detail & Related papers (2024-12-23T03:50:29Z) - Diffusion Transformer Policy [48.50988753948537]
We propose a large multi-modal diffusion transformer, dubbed as Diffusion Transformer Policy, to model continuous end-effector actions.<n>By leveraging the scaling capability of transformers, the proposed approach can effectively model continuous end-effector actions across large diverse robot datasets.
arXiv Detail & Related papers (2024-10-21T12:43:54Z) - Generalizable Spacecraft Trajectory Generation via Multimodal Learning with Transformers [14.176630393074149]
We present a novel trajectory generation framework that generalizes across diverse problem configurations.
We leverage high-capacity transformer neural networks capable of learning from data sources.
The framework is validated through simulations and experiments on a free-flyer platform.
arXiv Detail & Related papers (2024-10-15T15:55:42Z) - Proximal Policy Optimization-based Transmit Beamforming and Phase-shift
Design in an IRS-aided ISAC System for the THz Band [90.45915557253385]
IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity.
Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints.
arXiv Detail & Related papers (2022-03-21T09:15:18Z) - Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options [54.40780660868349]
We introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP)<n>LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning.<n>We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding.
arXiv Detail & Related papers (2020-07-06T05:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.