OceanGym: A Benchmark Environment for Underwater Embodied Agents
- URL: http://arxiv.org/abs/2509.26536v1
- Date: Tue, 30 Sep 2025 17:09:32 GMT
- Title: OceanGym: A Benchmark Environment for Underwater Embodied Agents
- Authors: Yida Xue, Mingjun Mao, Xiangyuan Ru, Yuqi Zhu, Baochang Ren, Shuofei Qiao, Mengru Wang, Shumin Deng, Xinyu An, Ningyu Zhang, Ying Chen, Huajun Chen,
- Abstract summary: OceanGym is the first comprehensive benchmark for ocean underwater embodied agents.<n>It is designed to advance AI in one of the most demanding real-world environments.<n>By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI.
- Score: 69.56465775825275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
Related papers
- IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation [56.43007596544299]
IndustryNav is the first dynamic industrial navigation benchmark for active spatial reasoning.<n>A study of nine state-of-the-art Visual Large Language Models reveals that closed-source models maintain a consistent advantage.
arXiv Detail & Related papers (2025-11-21T16:48:49Z) - Sensorium Arc: AI Agent System for Oceanic Data Exploration and Interactive Eco-Art [3.0447481187978886]
Sensorium Arc (AI reflects on climate) is a real-time multimodal interactive AI agent system that personifies the ocean as a poetic speaker.<n>The project demonstrates the potential of conversational AI agents to mediate affective, intuitive access to high-dimensional environmental data.
arXiv Detail & Related papers (2025-11-20T02:48:40Z) - UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding [54.16709436340606]
Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding.<n>Underwater imagery presents unique challenges including severe light attenuation, color distortion, and suspended particle scattering.<n>We introduce UWBench, a benchmark specifically designed for underwater vision-language understanding.
arXiv Detail & Related papers (2025-10-21T03:32:15Z) - UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents [17.86691411018085]
UAV-ON is a benchmark for large-scale Object Goal Navigation (NavObject) by aerial agents in open-world environments.<n>It comprises 14 high-fidelity Unreal Engine environments with diverse semantic regions and complex spatial layouts.<n>It defines 1270 annotated target objects, each characterized by an instance-level instruction that encodes category, physical footprint, and visual descriptors.
arXiv Detail & Related papers (2025-08-01T03:23:06Z) - Towards an Autonomous Surface Vehicle Prototype for Artificial Intelligence Applications of Water Quality Monitoring [68.41400824104953]
This paper presents a vehicle prototype that addresses the use of Artificial Intelligence algorithms and enhanced sensing techniques for water quality monitoring.
The vehicle is fully equipped with high-quality sensors to measure water quality parameters and water depth.
By means of a stereo-camera, it also can detect and locate macro-plastics in real environments.
arXiv Detail & Related papers (2024-10-08T10:35:32Z) - KUNPENG: An Embodied Large Model for Intelligent Maritime [16.21066869005095]
KUNPENG is the first-ever embodied large model for intelligent maritime in the smart ocean construction.
In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance.
arXiv Detail & Related papers (2024-07-12T07:16:22Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - AI-GOMS: Large AI-Driven Global Ocean Modeling System [3.635120568177384]
Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean.
Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction.
arXiv Detail & Related papers (2023-08-06T15:59:30Z) - Aeolus Ocean -- A simulation environment for the autonomous
COLREG-compliant navigation of Unmanned Surface Vehicles using Deep
Reinforcement Learning and Maritime Object Detection [0.0]
navigational autonomy in unmanned surface vehicles (USVs) in the maritime sector can lead to safer waters as well as reduced operating costs.
We describe the novel development of a COLREG-compliant DRL-based collision avoidant navigational system with CV-based awareness in a realistic ocean simulation environment.
arXiv Detail & Related papers (2023-07-13T11:20:18Z) - Guaranteed Discovery of Controllable Latent States with Multi-Step
Inverse Models [51.754160866582005]
Agent-Controllable State Discovery algorithm (AC-State)
Algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck.
We demonstrate the discovery of controllable latent state in three domains: localizing a robot arm with distractions, exploring in a maze alongside other agents, and navigating in the Matterport house simulator.
arXiv Detail & Related papers (2022-07-17T17:06:52Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.