Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement
- URL: http://arxiv.org/abs/2602.13291v1
- Date: Mon, 09 Feb 2026 00:29:06 GMT
- Title: Agent Mars: Multi-Agent Simulation for Multi-Planetary Life Exploration and Settlement
- Authors: Ziyang Wang,
- Abstract summary: Space exploration and settlement offer vast environments and resources, but impose constraints unmatched on Earth.<n>Key challenge is auditable coordination among specialised humans, robots, and digital services in a safety-critical system-of-systems.<n>We introduce Agent Mars, an open, end-to-end multi-agent simulation framework for Mars base operations.
- Score: 17.969021804498844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence (AI) has transformed robotics, healthcare, industry, and scientific discovery, yet a major frontier may lie beyond Earth. Space exploration and settlement offer vast environments and resources, but impose constraints unmatched on Earth: delayed/intermittent communications, extreme resource scarcity, heterogeneous expertise, and strict safety, accountability, and command authority. The key challenge is auditable coordination among specialised humans, robots, and digital services in a safety-critical system-of-systems. We introduce Agent Mars, an open, end-to-end multi-agent simulation framework for Mars base operations. Agent Mars formalises a realistic organisation with a 93-agent roster across seven layers of command and execution (human roles and physical assets), enabling base-scale studies beyond toy settings. It implements hierarchical and cross-layer coordination that preserves chain-of-command while allowing vetted cross-layer exchanges with audit trails; supports dynamic role handover with automatic failover under outages; and enables phase-dependent leadership for routine operations, emergencies, and science campaigns. Agent Mars further models mission-critical mechanisms-scenario-aware short/long-horizon memory, configurable propose-vote consensus, and translator-mediated heterogeneous protocols-to capture how teams align under stress. To quantify behaviour, we propose the Agent Mars Performance Index (AMPI), an interpretable composite score with diagnostic sub-metrics. Across 13 reproducible Mars-relevant operational scripts, Agent Mars reveals coordination trade-offs and identifies regimes where curated cross-layer collaboration and functional leadership reduce overhead without sacrificing reliability. Agent Mars provides a benchmarkable, auditable foundation for Space AI.
Related papers
- AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems [71.89040853616602]
We introduce AstroReason-Bench, a benchmark for evaluating agentic planning in Space Planning Problems (SPP)<n>AstroReason-Bench integrates multiple scheduling regimes, including ground station communication and agile Earth observation, and provides a unified agent-oriented interaction protocol.<n>We find that current agents substantially underperform specialized solvers, highlighting key limitations of generalist planning under realistic constraints.
arXiv Detail & Related papers (2026-01-16T15:02:41Z) - What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm for Studying Environment Understanding [50.35012849818872]
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks.<n>We propose Task-to-Quiz (T2Q), a deterministic and automated evaluation paradigm designed to decouple task execution from world-state understanding.<n>Our experiments reveal that task success is often a poor proxy for environment understanding, and that current memory machanism can not effectively help agents acquire a grounded model of the environment.
arXiv Detail & Related papers (2026-01-14T14:09:11Z) - Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence [54.91177026001217]
Large language model (LLM)-enabled teaming methods typically assume well-structured and known environments.<n>We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team.<n>Our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches.
arXiv Detail & Related papers (2025-10-30T18:24:38Z) - Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm [60.36837655498119]
We propose a Trajectory-based validated-by-Reproducing Agent-benchmark Complexity Evolution framework.<n>This framework takes an original task from an existing benchmark and encourages agents to evolve it into a new task with higher difficulty.<n>Experiments on the GAIA benchmark demonstrate that the TRACE framework consistently enhances task complexity while improving the reliability of correctness.
arXiv Detail & Related papers (2025-10-01T01:52:52Z) - UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z) - From MAS to MARS: Coordination Failures and Reasoning Trade-offs in Hierarchical Multi-Agent Robotic Systems within a Healthcare Scenario [3.5262044630932254]
Multi-agent robotic systems (MARS) build upon multi-agent systems by integrating physical and task-related constraints.<n>Despite the availability of advanced multi-agent frameworks, their real-world deployment on robots remains limited.
arXiv Detail & Related papers (2025-08-06T17:54:10Z) - Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study [10.393102715510937]
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions.<n>Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions.<n>We investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations.
arXiv Detail & Related papers (2025-06-18T07:42:11Z) - From Virtual Agents to Robot Teams: A Multi-Robot Framework Evaluation in High-Stakes Healthcare Context [2.016235597066821]
Current frameworks treat agents as conceptual task executors rather than physically embodied entities.<n>We propose three design guidelines emphasizing process transparency, proactive failure recovery, and contextual grounding.<n>Our work informs the development of more resilient and robust multi-agent robotic systems.
arXiv Detail & Related papers (2025-06-04T04:05:38Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - We Choose to Go to Space: Agent-driven Human and Multi-Robot
Collaboration in Microgravity [28.64243893838686]
Future space exploration requires humans and robots to work together.
We present SpaceAgents-1, a system for learning human and multi-robot collaboration strategies under microgravity conditions.
arXiv Detail & Related papers (2024-02-22T05:32:27Z) - Enabling Astronaut Self-Scheduling using a Robust Advanced Modelling and
Scheduling system: an assessment during a Mars analogue mission [44.621922701019336]
We study the usage of a computer decision-support tool by a crew of analog astronauts.
The proposed tool, called Romie, belongs to the new category of Robust Advanced Modelling and Scheduling (RAMS) systems.
arXiv Detail & Related papers (2023-01-14T21:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.