Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
- URL: http://arxiv.org/abs/2509.20745v2
- Date: Fri, 26 Sep 2025 03:42:36 GMT
- Title: Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
- Authors: Yu Guo, Shengfeng He, Yuxu Lu, Haonan An, Yihang Tao, Huilin Zhu, Jingxian Liu, Yuguang Fang,
- Abstract summary: Neptune-X is a data-centric generative-selection framework for maritime object detection.<n>X-to-Maritime is a multi-modality-conditioned generative model that synthesizes diverse and realistic maritime scenes.<n>Our approach sets a new benchmark in maritime scene synthesis, significantly improving detection accuracy.
- Score: 54.1960918379255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Maritime object detection is essential for navigation safety, surveillance, and autonomous operations, yet constrained by two key challenges: the scarcity of annotated maritime data and poor generalization across various maritime attributes (e.g., object category, viewpoint, location, and imaging environment). To address these challenges, we propose Neptune-X, a data-centric generative-selection framework that enhances training effectiveness by leveraging synthetic data generation with task-aware sample selection. From the generation perspective, we develop X-to-Maritime, a multi-modality-conditioned generative model that synthesizes diverse and realistic maritime scenes. A key component is the Bidirectional Object-Water Attention module, which captures boundary interactions between objects and their aquatic surroundings to improve visual fidelity. To further improve downstream tasking performance, we propose Attribute-correlated Active Sampling, which dynamically selects synthetic samples based on their task relevance. To support robust benchmarking, we construct the Maritime Generation Dataset, the first dataset tailored for generative maritime learning, encompassing a wide range of semantic conditions. Extensive experiments demonstrate that our approach sets a new benchmark in maritime scene synthesis, significantly improving detection accuracy, particularly in challenging and previously underrepresented settings. The code is available at https://github.com/gy65896/Neptune-X.
Related papers
- Dynamic Topology Awareness: Breaking the Granularity Rigidity in Vision-Language Navigation [22.876516699004814]
Vision-Language Navigation in Continuous Environments (VLN-CE) presents a core challenge: grounding high-level linguistic instructions into precise, safe, and long-horizon spatial actions.<n>Explicit topological maps have proven to be a vital solution for providing robust spatial memory in such tasks.<n>Existing topological planning methods suffer from a "Granularity Rigidity" problem.<n>We propose DGNav, a framework for Dynamic Topological Navigation, introducing a context-aware mechanism to modulate map density and connectivity on-the-fly.
arXiv Detail & Related papers (2026-01-29T14:06:23Z) - History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation [64.51891404034164]
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments.<n>Existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects.<n>This work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline.
arXiv Detail & Related papers (2025-12-16T09:16:07Z) - Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation [67.68165784193556]
Nav-$R2$ is a framework that explicitly models two types of relationships, target-environment modeling and environment-action planning.<n>Our SA-Mem preserves the most target-relevant and current observation-relevant features from both temporal and semantic perspectives.<n>Nav-R2 achieves state-of-the-art performance in localizing unseen objects through a streamlined and efficient pipeline.
arXiv Detail & Related papers (2025-12-02T04:21:02Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset [76.92197418745822]
camouflaged instance segmentation (CIS) faces greater challenges in accurately segmenting objects that blend closely with their surroundings.<n>Traditional camouflaged instance segmentation methods, trained on terrestrial-dominated datasets with limited underwater samples, may exhibit inadequate performance in underwater scenes.<n>We introduce the first underwater camouflaged instance segmentation dataset, UCIS4K, which comprises 3,953 images of camouflaged marine organisms with instance-level annotations.
arXiv Detail & Related papers (2025-10-20T14:34:51Z) - MVTD: A Benchmark Dataset for Maritime Visual Object Tracking [4.956066467858057]
Maritime Visual Tracking dataset (MVTD) comprises 182 high-resolution video sequences, totaling approximately 150,000 frames.<n>MVTD captures a diverse range of operational conditions and maritime scenarios, reflecting the real-world complexities of maritime environments.<n>We evaluated 14 recent SOTA tracking algorithms on the MVTD benchmark and observed substantial performance degradation compared to their performance on general-purpose datasets.
arXiv Detail & Related papers (2025-06-03T13:30:11Z) - HMPNet: A Feature Aggregation Architecture for Maritime Object Detection from a Shipborne Perspective [16.421691711725916]
A novel dataset annotated for 12 object categories under diverse maritime environments and weather conditions is presented.<n>We propose HMPNet, a lightweight architecture tailored for shipborne object detection.<n> Empirical evaluations indicate that HMPNet surpasses current state-of-the-art methods in terms of both accuracy and computational efficiency.
arXiv Detail & Related papers (2025-05-13T05:17:53Z) - VRS-UIE: Value-Driven Reordering Scanning for Underwater Image Enhancement [104.78586859995333]
State Space Models (SSMs) have emerged as a promising backbone for vision tasks due to their linear complexity and global receptive field.<n>The predominance of large-portion, homogeneous but useless oceanic backgrounds can dilute the feature representation responses of sparse yet valuable targets.<n>We propose a novel Value-Driven Reordering Scanning framework for Underwater Image Enhancement (UIE)<n>Our framework sets a new state-of-the-art, delivering superior enhancement performance (surpassing WMamba by 0.89 dB on average) by effectively suppressing water bias and preserving structural and color fidelity.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - World-Consistent Data Generation for Vision-and-Language Navigation [33.13590164890286]
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions.<n>One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments.<n>We propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency.
arXiv Detail & Related papers (2024-12-09T11:40:54Z) - MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios [10.748210940033484]
The Maritime Ship Navigation Behavior dataset (MID) is designed to address challenges in ship detection within complex maritime environments.<n>MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning.<n>MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions.
arXiv Detail & Related papers (2024-12-08T09:34:23Z) - Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset [3.468621550644668]
The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI)
object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions.
Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems.
arXiv Detail & Related papers (2024-07-12T05:48:53Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.