A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models
- URL: http://arxiv.org/abs/2410.16024v1
- Date: Mon, 21 Oct 2024 13:58:38 GMT
- Title: A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models
- Authors: Yue Deng, Weiyu Ma, Yuxin Fan, Yin Zhang, Haifeng Zhang, Jian Zhao,
- Abstract summary: StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL)
Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model.
In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC.
- Score: 8.457552813123597
- License:
- Abstract: StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model, and the resulting policies are typically non-interpretable with weak transferability. In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC. In our framework, agents leverage large language models (LLMs) to generate decision tree code by providing task descriptions. The model is further self-reflection using feedback from the rewards provided by the environment. We conduct experiments in the SMAC and demonstrate that our method can produce high-quality, interpretable decision trees with minimal environmental exploration. Moreover, these models exhibit strong transferability, successfully applying to similar SMAC environments without modification. We believe this approach offers a new direction for solving decision-making tasks in the future.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - SMAC-Hard: Enabling Mixed Opponent Strategy Script and Self-play on SMAC [19.897956357070697]
We present SMAC-HARD, a novel benchmark to enhance training robustness and evaluation comprehensiveness.
SMAC-HARD supports customizable opponent strategies, randomization of adversarial policies, and interfaces for MARL self-play.
We conduct extensive evaluations of widely used and state-of-the-art algorithms on SMAC-HARD, revealing the substantial challenges posed by edited and mixed strategy opponents.
arXiv Detail & Related papers (2024-12-23T16:36:21Z) - EnvBridge: Bridging Diverse Environments with Cross-Environment Knowledge Transfer for Embodied AI [7.040779338576156]
Large Language Models (LLMs) can generate text planning or control code for robots.
These methods still face challenges in terms of flexibility and applicability across different environments.
We propose EnvBridge to enhance the adaptability and robustness of robotic manipulation agents.
arXiv Detail & Related papers (2024-10-22T11:52:22Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Breaking Down the Task: A Unit-Grained Hybrid Training Framework for
Vision and Language Decision Making [19.87916700767421]
Vision language decision making (VLDM) is a challenging multimodal task.
From an environment perspective, we find that task episodes can be divided into fine-grained textitunits
We propose a novel hybrid-training framework that enables active exploration in the environment and reduces the exposure bias.
arXiv Detail & Related papers (2023-07-16T11:54:16Z) - Efficient Distributed Framework for Collaborative Multi-Agent
Reinforcement Learning [17.57163419315147]
Multi-agent reinforcement learning for incomplete information environments has attracted extensive attention from researchers.
There are still some problems in multi-agent reinforcement learning, such as unstable model iteration and low training efficiency.
In this paper, we design an distributed MARL framework based on the actor-work-learner architecture.
arXiv Detail & Related papers (2022-05-11T03:12:49Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.