LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
- URL: http://arxiv.org/abs/2405.17424v1
- Date: Mon, 27 May 2024 17:59:32 GMT
- Title: LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
- Authors: Zhuoling Li, Xiaogang Xu, Zhenhua Xu, SerNam Lim, Hengshuang Zhao,
- Abstract summary: We introduce the large auto-regressive model (LARM) for embodied agents.
LARM uses both text and multi-view images as input and predicts subsequent actions in an auto-regressive manner.
Adopting a two-phase training regimen, LARM successfully harvests enchanted equipment in Minecraft.
- Score: 68.27280750612204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the need to interact with the real world, embodied agents are required to possess comprehensive prior knowledge, long-horizon planning capability, and a swift response speed. Despite recent large language model (LLM) based agents achieving promising performance, they still exhibit several limitations. For instance, the output of LLMs is a descriptive sentence, which is ambiguous when determining specific actions. To address these limitations, we introduce the large auto-regressive model (LARM). LARM leverages both text and multi-view images as input and predicts subsequent actions in an auto-regressive manner. To train LARM, we develop a novel data format named auto-regressive node transmission structure and assemble a corresponding dataset. Adopting a two-phase training regimen, LARM successfully harvests enchanted equipment in Minecraft, which demands significantly more complex decision-making chains than the highest achievements of prior best methods. Besides, the speed of LARM is 6.8x faster.
Related papers
- Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding [11.832919020149891]
This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters.
We propose textbfSmart textbfParallel textbfAuto-textbfCorrect dtextbfEcoding (SPACE)
arXiv Detail & Related papers (2024-02-19T03:39:10Z) - InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance [56.184255657175335]
We develop textbfInferAligner, a novel inference-time alignment method that utilizes cross-model guidance for harmlessness alignment.
Experimental results show that our method can be very effectively applied to domain-specific models in finance, medicine, and mathematics.
It significantly diminishes the Attack Success Rate (ASR) of both harmful instructions and jailbreak attacks, while maintaining almost unchanged performance in downstream tasks.
arXiv Detail & Related papers (2024-01-20T10:41:03Z) - LLMCad: Fast and Scalable On-device Large Language Model Inference [11.103824752113148]
Generative tasks, such as text generation and question answering, hold a crucial position in the realm of mobile applications.
Currently, the execution of these generative tasks heavily depends on Large Language Models (LLMs)
We introduce LLMCad, an on-device inference engine specifically designed for efficient generative Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2023-09-08T10:44:19Z) - LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures.
This work identifies three major factors contributing to this length generalization failure.
We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - Capture Salient Historical Information: A Fast and Accurate
Non-Autoregressive Model for Multi-turn Spoken Language Understanding [18.988599232838766]
Existing work increases inference speed by designing non-autoregressive models for single-turn Spoken Language Understanding tasks.
We propose a novel model for multi-turn SLU named Salient History Attention with Layer-Refined Transformer (SHA-LRT)
SHA captures historical information for the current dialogue from both historical utterances and results via a well-designed history-attention mechanism.
arXiv Detail & Related papers (2022-06-24T10:45:32Z) - An Effective Non-Autoregressive Model for Spoken Language Understanding [15.99246711701726]
We propose a novel non-autoregressive Spoken Language Understanding model named Layered-Refine Transformer.
With SLG, the non-autoregressive model can efficiently obtain dependency information during training and spend no extra time in inference.
Experiments on two public datasets indicate that our model significantly improves SLU performance (1.5% on Overall accuracy) while substantially speed up (more than 10 times) the inference process.
arXiv Detail & Related papers (2021-08-16T10:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.