Creative Agents: Empowering Agents with Imagination for Creative Tasks
- URL: http://arxiv.org/abs/2312.02519v1
- Date: Tue, 5 Dec 2023 06:00:52 GMT
- Title: Creative Agents: Empowering Agents with Imagination for Creative Tasks
- Authors: Chi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan, Zongqing Lu
- Abstract summary: We propose a class of solutions for creative agents, where the controller is enhanced with an imaginator that generates detailed imaginations of task outcomes conditioned on language instructions.
We benchmark creative tasks with the challenging open-world game Minecraft, where the agents are asked to create diverse buildings given free-form language instructions.
We perform a detailed experimental analysis of creative agents, showing that creative agents are the first AI agents accomplishing diverse building creation in the survival mode of Minecraft.
- Score: 31.920963353890393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study building embodied agents for open-ended creative tasks. While
existing methods build instruction-following agents that can perform diverse
open-ended tasks, none of them demonstrates creativity -- the ability to give
novel and diverse task solutions implicit in the language instructions. This
limitation comes from their inability to convert abstract language instructions
into concrete task goals in the environment and perform long-horizon planning
for such complicated goals. Given the observation that humans perform creative
tasks with the help of imagination, we propose a class of solutions for
creative agents, where the controller is enhanced with an imaginator that
generates detailed imaginations of task outcomes conditioned on language
instructions. We introduce several approaches to implementing the components of
creative agents. We implement the imaginator with either a large language model
for textual imagination or a diffusion model for visual imagination. The
controller can either be a behavior-cloning policy learned from data or a
pre-trained foundation model generating executable codes in the environment. We
benchmark creative tasks with the challenging open-world game Minecraft, where
the agents are asked to create diverse buildings given free-form language
instructions. In addition, we propose novel evaluation metrics for open-ended
creative tasks utilizing GPT-4V, which holds many advantages over existing
metrics. We perform a detailed experimental analysis of creative agents,
showing that creative agents are the first AI agents accomplishing diverse
building creation in the survival mode of Minecraft. Our benchmark and models
are open-source for future research on creative agents
(https://github.com/PKU-RL/Creative-Agents).
Related papers
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - A Framework for Collaborating a Large Language Model Tool in Brainstorming for Triggering Creative Thoughts [2.709166684084394]
This study proposes a framework called GPS, which employs goals, prompts, and strategies to guide designers to systematically work with an LLM tool for improving the creativity of ideas generated during brainstorming.
Our framework, tested through a design example and a case study, demonstrates its effectiveness in stimulating creativity and its seamless LLM tool integration into design practices.
arXiv Detail & Related papers (2024-10-10T13:39:27Z) - Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification [34.97881486372797]
Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing.
We introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks.
Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification.
arXiv Detail & Related papers (2024-05-24T10:25:59Z) - Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment.
Our approach focuses on language-driven generality while imposing minimal assumptions.
Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z) - Can AI Be as Creative as Humans? [84.43873277557852]
We prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators.
The debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data.
arXiv Detail & Related papers (2024-01-03T08:49:12Z) - Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation [19.62178304006683]
We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas.
We propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses.
arXiv Detail & Related papers (2023-10-19T17:53:14Z) - Creative Wand: A System to Study Effects of Communications in
Co-Creative Settings [9.356870107137093]
Co-creative, mixed-initiative systems require user-centric means of influencing the algorithm.
Key questions in co-creative AI include: How can users express their creative intentions?
We introduce CREATIVE-WAND, a customizable framework for investigating co-creative mixed-initiative generation.
arXiv Detail & Related papers (2022-08-04T20:56:40Z) - MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
Knowledge [70.47759528596711]
We introduce MineDojo, a new framework built on the popular Minecraft game.
We propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function.
Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward.
arXiv Detail & Related papers (2022-06-17T15:53:05Z) - Exploring Latent Dimensions of Crowd-sourced Creativity [0.02294014185517203]
We build our work on the largest AI-based creativity platform, Artbreeder.
We explore the latent dimensions of images generated on this platform and present a novel framework for manipulating images to make them more creative.
arXiv Detail & Related papers (2021-12-13T19:24:52Z) - Telling Creative Stories Using Generative Visual Aids [52.623545341588304]
We asked writers to write creative stories from a starting prompt, and provided them with visuals created by generative AI models from the same prompt.
Compared to a control group, writers who used the visuals as story writing aid wrote significantly more creative, original, complete and visualizable stories.
Findings indicate that cross modality inputs by AI can benefit divergent aspects of creativity in human-AI co-creation, but hinders convergent thinking.
arXiv Detail & Related papers (2021-10-27T23:13:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.