Related papers: Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents

Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents

URL: http://arxiv.org/abs/2301.11891v1
Date: Fri, 27 Jan 2023 18:08:04 GMT
Title: Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents
Authors: Stephen A. Goss, Robert J. Steininger, Dhruv Narayanan, Daniel V. Oliven\c{c}a, Yutong Sun, Peng Qiu, Jim Amato, Eberhard O. Voit, Walter E. Voit, Eric J. Kildebeck
Abstract summary: We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. In summary, we report a versatile and AI evaluation platform with a low barrier to entry for AI researchers to utilize.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As artificial intelligence research advances, the platforms used to evaluate AI agents need to adapt and grow to continue to challenge them. We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. Our platform is built to allow AI agents with different architectures to easily interact with the Minecraft world, train and be evaluated in multiple tasks. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. All actions taken by AI agents and external actors (non-player-characters, NPCs) in the open-world environment are logged to streamline evaluation. Here we present two custom tasks on the PAL platform, one focused on multi-step planning and one focused on navigation, and evaluations of agents solving them. In summary, we report a versatile and extensible AI evaluation platform with a low barrier to entry for AI researchers to utilize.

Related papers

Actionable AI: Enabling Non Experts to Understand and Configure AI Systems [5.534140394498714]
Actionable AI allows non-experts to configure black-box agents. In uncertain conditions, non-experts achieve good levels of performance. We propose Actionable AI as a way to open access to AI-based agents.
arXiv Detail & Related papers (2025-03-09T23:09:04Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment [38.14321677323052]
Embodied artificial intelligence emphasizes the role of an agent's body in generating human-like behaviors. In this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments.
arXiv Detail & Related papers (2024-10-12T17:49:26Z)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z)
Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z)
Toward Human-AI Alignment in Large-Scale Multi-Player Games [24.784173202415687]
We analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games) We find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications.
arXiv Detail & Related papers (2024-02-05T22:55:33Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI) We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z)
The MineRL BASALT Competition on Learning from Human Feedback [58.17897225617566]
The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
arXiv Detail & Related papers (2021-07-05T12:18:17Z)
Explainability via Responsibility [0.9645196221785693]
We present an approach to explainable artificial intelligence in which certain training instances are offered to human users. We evaluate this approach by approximating its ability to provide human users with the explanations of AI agent's actions.
arXiv Detail & Related papers (2020-10-04T20:41:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.