Related papers: LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System

URL: http://arxiv.org/abs/2601.13096v1
Date: Mon, 19 Jan 2026 14:36:50 GMT
Title: LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System
Authors: Muhayy Ud Din, Waseem Akram, Ahsan B. Bakht, Irfan Hussain,
Abstract summary: This study introduces a novel integrated engineering framework to enable autonomous maritime port inspection.<n>The proposed framework replaces traditional state-machine mission planners with LLM-driven symbolic planning.<n>The VLM module performs real-time semantic inspection and compliance assessment, generating structured reports with contextual reasoning.
Score: 1.9965927292119217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing inspection methods often rely on manual operations and conventional computer vision techniques that lack scalability and contextual understanding. This study introduces a novel integrated engineering framework that utilizes the synergy between Large Language Models (LLMs) and Vision Language Models (VLMs) to enable autonomous maritime port inspection using cooperative aerial and surface robotic platforms. The proposed framework replaces traditional state-machine mission planners with LLM-driven symbolic planning and improved perception pipelines through VLM-based semantic inspection, enabling context-aware and adaptive monitoring. The LLM module translates natural language mission instructions into executable symbolic plans with dependency graphs that encode operational constraints and ensure safe UAV-USV coordination. Meanwhile, the VLM module performs real-time semantic inspection and compliance assessment, generating structured reports with contextual reasoning. The framework was validated using the extended MBZIRC Maritime Simulator with realistic port infrastructure and further assessed through real-world robotic inspection trials. The lightweight on-board design ensures suitability for resource-constrained maritime platforms, advancing the development of intelligent, autonomous inspection systems. Project resources (code and videos) can be found here: https://github.com/Muhayyuddin/llm-vlm-fusion-port-inspection

Related papers

A Unified Experimental Architecture for Informative Path Planning: from Simulation to Deployment with GuadalPlanner [69.43049144653882]
This paper introduces a unified architecture that decouples high-level decision-making from vehicle-specific control.<n>The proposed architecture is realized through GuadalPlanner, which defines standardized interfaces between planning, sensing, and vehicle execution.
arXiv Detail & Related papers (2026-02-11T10:02:31Z)
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator [1.4878644292213625]
VLN-Pilot is a framework in which a large Vision-and-Language Model assumes the role of a human pilot for indoor drone navigation.<n>Our framework integrates language-driven semantic understanding with visual perception, enabling context-aware, high-level flight behaviors.
arXiv Detail & Related papers (2026-02-05T11:23:11Z)
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents [12.383467162169703]
We introduce a unified and evaluation framework to probe MLLMs as zero-shot agents.<n>We simplify the evaluation with a highly modular and accessible design.<n>We observe that enhancing our baseline agent with Chain-of-Thought (CoT) reasoning and self-language leads to an unexpected performance decrease.
arXiv Detail & Related papers (2025-12-31T13:21:21Z)
Run, Ruminate, and Regulate: A Dual-process Thinking System for Vision-and-Language Navigation [52.11339614452127]
Vision-and-Language Navigation (VLN) requires an agent to dynamically explore complex 3D environments following human instructions.<n>Recent research underscores the potential of harnessing large language models (LLMs) for VLN, given their commonsense knowledge and general reasoning capabilities.<n>We propose a novel dual-process thinking framework dubbed R3, integrating LLMs' generalization capabilities with VLN-specific expertise in a zero-shot manner.
arXiv Detail & Related papers (2025-11-18T04:32:00Z)
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering [55.56674028743782]
Large language model (LLM) steering has emerged as a promising paradigm for controlling model behavior at inference time.<n>We present EasySteer, a unified framework for high-performance, LLM steering built on vLLM.
arXiv Detail & Related papers (2025-09-29T17:59:07Z)
Semantic-Aware Ship Detection with Vision-Language Integration [9.49989812166076]
Ship detection in remote sensing imagery is a critical task with wide-ranging applications, such as maritime activity monitoring, shipping logistics, and environmental studies.<n>We propose a novel detection framework that combines Vision-Language Models (VLMs) with a multi-scale adaptive sliding window strategy.<n>We evaluate our framework through three well-defined tasks, providing a comprehensive analysis of its performance and demonstrating its effectiveness in advancing SASD from multiple perspectives.
arXiv Detail & Related papers (2025-08-21T19:24:52Z)
LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving [58.535516533697425]
Large vision-language models (VLMs) have shown promising capabilities in scene understanding.<n>We propose a novel vision-language framework tailored for autonomous driving, called LMAD.<n>Our framework emulates modern end-to-end driving paradigms by incorporating comprehensive scene understanding and a task-specialized structure with VLMs.
arXiv Detail & Related papers (2025-08-17T15:42:54Z)
LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks [57.27815890269697]
This work focuses on maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under energy constraints.<n>We introduce a Large Language Model (LLM)-guided multi-agent learning approach.<n>Results show that our method outperforms existing baselines in secrecy and energy efficiency.
arXiv Detail & Related papers (2025-07-23T04:22:57Z)
Maritime Mission Planning for Unmanned Surface Vessel using Large Language Model [0.932065750652415]
This paper introduces a novel mission planning framework that uses Large Language Models (LLMs)<n>LLMs are proficient at understanding natural language commands, executing symbolic reasoning, and flexibly adjusting to changing situations.<n>Our approach integrates LLMs into maritime mission planning to bridge the gap between high-level human instructions and executable plans.
arXiv Detail & Related papers (2025-03-15T09:41:55Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
Integrating Large Language Models for UAV Control in Simulated Environments: A Modular Interaction Approach [0.3495246564946556]
This study explores the application of Large Language Models in UAV control. By enabling UAVs to interpret and respond to natural language commands, LLMs simplify the UAV control and usage. The paper discusses several key areas where LLMs can impact UAV technology, including autonomous decision-making, dynamic mission planning, enhanced situational awareness, and improved safety protocols.
arXiv Detail & Related papers (2024-10-23T06:56:53Z)
Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems. LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning. We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.