Related papers: TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

URL: http://arxiv.org/abs/2303.16434v1
Date: Wed, 29 Mar 2023 03:30:38 GMT
Title: TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan
Abstract summary: We introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. We will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next.
Score: 71.7495056818522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain-specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well. However, due to the different implementation or working mechanisms, they are not easily accessible or compatible with foundation models. Therefore, there is a clear and pressing need for a mechanism that can leverage foundation models to propose task solution outlines and then automatically match some of the sub-tasks in the outlines to the off-the-shelf models and systems with special functionalities to complete them. Inspired by this, we introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. Unlike most previous work that aimed to improve a single AI model, TaskMatrix.AI focuses more on using existing foundation models (as a brain-like central system) and APIs of other AI models and systems (as sub-task solvers) to achieve diversified tasks in both digital and physical domains. As a position paper, we will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next.

Related papers

A Survey on Vision-Language-Action Models for Embodied AI [71.16123093739932]
Vision-language-action models (VLAs) have become a foundational element in robot learning. Various methods have been proposed to enhance traits such as versatility, dexterity, and generalizability. VLAs serve as high-level task planners capable of decomposing long-horizon tasks into executable subtasks.
arXiv Detail & Related papers (2024-05-23T01:43:54Z)
An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z)
SAI: Solving AI Tasks with Systematic Artificial Intelligence in Communication Network [4.302209772725456]
Systematic Artificial Intelligence (SAI) is a framework designed to solve AI tasks by leveraging Large Language Models (LLMs) and intent-format-based input. SAI can complete numerous complex AI tasks in the communication network, achieving impressive results in network optimization, resource allocation, and other challenging tasks.
arXiv Detail & Related papers (2023-10-13T12:14:58Z)
OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents. We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [85.25054021362232]
Large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning. LLMs could act as a controller to manage existing AI models to solve complicated AI tasks. We present HuggingGPT, an LLM-powered agent that connects various AI models in machine learning communities.
arXiv Detail & Related papers (2023-03-30T17:48:28Z)
Foundation models in brief: A historical, socio-technical focus [2.5991265608180396]
Foundation models can be disruptive for future AI development by scaling up deep learning. Models achieve state-of-the-art performance on a variety of tasks in domains such as natural language processing and computer vision.
arXiv Detail & Related papers (2022-12-17T22:11:33Z)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model. We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data. We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
NeurAll: Towards a Unified Visual Perception Model for Automated Driving [8.49826472556323]
We propose a joint multi-task network design for learning several tasks simultaneously. The main bottleneck in automated driving systems is the limited processing power available on deployment hardware.
arXiv Detail & Related papers (2019-02-10T12:45:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.