Related papers: Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander

Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander

URL: http://arxiv.org/abs/2507.11079v1
Date: Tue, 15 Jul 2025 08:22:37 GMT
Title: Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander
Authors: Li Wang, Qizhen Wu, Lei Chen,
Abstract summary: We propose a vision-language model-based commander to address the issue of intelligent perception-to-decision reasoning.<n>Our method integrates a vision language model for scene understanding and a lightweight large language model for strategic reasoning.<n>Unlike rule-based search and reinforcement learning methods, the combination of the two modules establishes a full-chain process.
Score: 7.652649478304803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In multiple unmanned ground vehicle confrontations, autonomously evolving multi-agent tactical decisions from situational awareness remain a significant challenge. Traditional handcraft rule-based methods become vulnerable in the complicated and transient battlefield environment, and current reinforcement learning methods mainly focus on action manipulation instead of strategic decisions due to lack of interpretability. Here, we propose a vision-language model-based commander to address the issue of intelligent perception-to-decision reasoning in autonomous confrontations. Our method integrates a vision language model for scene understanding and a lightweight large language model for strategic reasoning, achieving unified perception and decision within a shared semantic space, with strong adaptability and interpretability. Unlike rule-based search and reinforcement learning methods, the combination of the two modules establishes a full-chain process, reflecting the cognitive process of human commanders. Simulation and ablation experiments validate that the proposed approach achieves a win rate of over 80% compared with baseline models.

Related papers

ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving [27.75047397292818]
End-to-end autonomous driving has emerged as a promising approach to unify perception, prediction, and planning within a single framework.<n>We propose ReAL-AD, a Reasoning-Augmented Learning framework that structures decision-making in autonomous driving based on the three-tier human cognitive model.<n>We show that integrating our framework improves planning accuracy and safety by over 30%, making end-to-end autonomous driving more interpretable and aligned with human-like hierarchical reasoning.
arXiv Detail & Related papers (2025-07-16T02:23:24Z)
Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z)
ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models [26.955482205849282]
Large Vision-Language Models (LVLMs) have recently advanced robotic manipulation by leveraging vision for scene perception and language for instruction following.<n>We propose ManipLVM-R1, a novel reinforcement learning framework that replaces traditional supervision with Reinforcement Learning using Verifiable Rewards (RLVR).
arXiv Detail & Related papers (2025-05-22T10:57:07Z)
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning [75.04643265875072]
Large reasoning models (LRMs) have demonstrated strong performance on complex reasoning tasks, but often suffer from overthinking.<n>Inspired by the dual process theory in cognitive science, we propose Adaptive Cognition Policy Optimization.<n>ACPO enables LRMs to achieve efficient reasoning through adaptive cognitive allocation and dynamic system switch.
arXiv Detail & Related papers (2025-05-22T07:15:08Z)
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities [5.0778942095543576]
This paper introduces an adversarial evaluation framework designed to systematically stress-test the decision-making processes of Large Language Models.<n>We apply this framework to several state-of-the-art LLMs, including GPT-3.5, GPT-4, Gemini-1.5, and DeepSeek-V3.<n>Our findings highlight distinct behavioral patterns across models and emphasize the importance of adaptability and fairness recognition for trustworthy AI deployment.
arXiv Detail & Related papers (2025-05-19T14:50:44Z)
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics [40.06500618820166]
Multi-Agent Reinforcement Learning (MARL) enables coordination among autonomous agents in complex scenarios.<n>MARL's practical deployment in sensitive military contexts is constrained by the lack of explainability.<n>This work reviews and assesses current advances in explainability methods for MARL with a focus on simulated air combat scenarios.
arXiv Detail & Related papers (2025-05-16T14:36:30Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
Planning Like Human: A Dual-process Framework for Dialogue Planning [31.995557540062553]
We propose the Dual-Process Dialogue Planning framework to enhance dialogue planning in Large Language Models (LLMs) Inspired by the dualprocess theory in psychology, we propose the framework, which embodies two modes of thinking: intuitive (fast) and analytical (slow) Our empirical evaluations affirm DPDP's superiority in achieving both high-quality dialogues and operational efficiency, outpacing existing methods.
arXiv Detail & Related papers (2024-06-08T06:52:47Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Context-Aware Language Modeling for Goal-Oriented Dialogue Systems [84.65707332816353]
We formulate goal-oriented dialogue as a partially observed Markov decision process. We derive a simple and effective method to finetune language models in a goal-aware way. We evaluate our method on a practical flight-booking task using AirDialogue.
arXiv Detail & Related papers (2022-04-18T17:23:11Z)
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments. We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.