Related papers: Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat

Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat

URL: http://arxiv.org/abs/2603.00974v1
Date: Sun, 01 Mar 2026 08:05:32 GMT
Title: Intent-Context Synergy Reinforcement Learning for Autonomous UAV Decision-Making in Air Combat
Authors: Jiahao Fu, Feng Yang,
Abstract summary: This paper proposes an Intent-Context Synergy Reinforcement Learning (ICS-RL) framework for autonomous UAV infiltration in contested environments.<n>An LSTM-based Intent Prediction Module forecasts the future trajectories of hostile units, transforming the decision paradigm from reactive avoidance to proactive planning.<n>A Context-Analysis Synergy Mechanism decomposes the mission into hierarchical sub-tasks (safe cruise, stealth planning, and hostile breakthrough)<n>A dynamic switching controller based on Max-Advantage values seamlessly integrates these agents, allowing the UAV to adaptively select the optimal policy without hard-coded rules.
Score: 2.9612776591672443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous UAV infiltration in dynamic contested environments remains a significant challenge due to the partially observable nature of threats and the conflicting objectives of mission efficiency versus survivability. Traditional Reinforcement Learning (RL) approaches often suffer from myopic decision-making and struggle to balance these trade-offs in real-time. To address these limitations, this paper proposes an Intent-Context Synergy Reinforcement Learning (ICS-RL) framework. The framework introduces two core innovations: (1) An LSTM-based Intent Prediction Module that forecasts the future trajectories of hostile units, transforming the decision paradigm from reactive avoidance to proactive planning via state augmentation; (2) A Context-Analysis Synergy Mechanism that decomposes the mission into hierarchical sub-tasks (safe cruise, stealth planning, and hostile breakthrough). We design a heterogeneous ensemble of Dueling DQN agents, each specialized in a specific tactical context. A dynamic switching controller based on Max-Advantage values seamlessly integrates these agents, allowing the UAV to adaptively select the optimal policy without hard-coded rules. Extensive simulations demonstrate that ICS-RL significantly outperforms baselines (Standard DDQN) and traditional methods (PSO, Game Theory). The proposed method achieves a mission success rate of 88\% and reduces the average exposure frequency to 0.24 per episode, validating its superiority in ensuring robust and stealthy penetration in high-dynamic scenarios.

Related papers

TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z)
OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z)
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z)
Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions [1.7188280334580195]
Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance missions.<n>These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making.<n>This paper presents an adaptive Markov Decision Processes framework tailored to ISR missions with CCAs.
arXiv Detail & Related papers (2025-12-01T13:31:40Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
DOPA: Stealthy and Generalizable Backdoor Attacks from a Single Client under Challenging Federated Constraints [2.139012072214621]
Federated Learning (FL) is increasingly adopted for privacy-preserving collaborative training, but its decentralized nature makes it susceptible to backdoor attacks.<n>Existing attack methods, however, often rely on idealized assumptions and fail to remain effective under real-world constraints.<n>We propose DOPA, a novel framework that simulates heterogeneous local training dynamics and seeks consensus across divergent optimization trajectories to craft universally effective and stealthy backdoor triggers.
arXiv Detail & Related papers (2025-08-20T08:39:12Z)
Reinforcement Learning for Decision-Level Interception Prioritization in Drone Swarm Defense [51.736723807086385]
We present a case study demonstrating the practical advantages of reinforcement learning in addressing this challenge.<n>We introduce a high-fidelity simulation environment that captures realistic operational constraints.<n>Agent learns to coordinate multiple effectors for optimal interception prioritization.<n>We evaluate the learned policy against a handcrafted rule-based baseline across hundreds of simulated attack scenarios.
arXiv Detail & Related papers (2025-08-01T13:55:39Z)
CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization [73.13843039509386]
This paper presents CyGATE, a game-theoretic framework modeling attacker-defender interactions.<n>CyGATE frames cyber conflicts as a partially observable game (POSG) across Cyber Kill Chain stages.<n>The framework's flexible architecture enables extension to multi-agent scenarios.
arXiv Detail & Related papers (2025-08-01T09:53:06Z)
Policy Disruption in Reinforcement Learning:Adversarial Attack with Large Language Models and Critical State Identification [8.292056374554162]
Reinforcement learning (RL) has achieved remarkable success in fields like robotics and autonomous driving.<n>Existing approaches often rely on modifying the environment or policy, limiting their practicality.<n>This paper proposes an adversarial attack method in which existing agents in the environment guide the target policy to output suboptimal actions without altering the environment.
arXiv Detail & Related papers (2025-07-24T05:52:06Z)
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments [6.956559003734227]
An unmanned aerial vehicles (UAVs) has been exposed to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL)<n>This paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts.<n>It achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories.
arXiv Detail & Related papers (2025-06-26T10:06:29Z)
Embodied Laser Attack:Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks [13.726534285661717]
This paper introduces the Embodied Laser Attack (ELA), a novel framework that dynamically tailors non-contact laser attacks. For the perception module, ELA has innovatively developed a local perspective transformation network, based on the intrinsic prior knowledge of traffic scenes. For the decision and control module, ELA trains an attack agent with data-driven reinforcement learning instead of adopting time-consuming algorithms.
arXiv Detail & Related papers (2023-12-15T06:16:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.