Collaborative Evolving Strategy for Automatic Data-Centric Development
- URL: http://arxiv.org/abs/2407.18690v1
- Date: Fri, 26 Jul 2024 12:16:47 GMT
- Title: Collaborative Evolving Strategy for Automatic Data-Centric Development
- Authors: Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian,
- Abstract summary: This paper introduces the automatic data-centric development (AD2) task.
It outlines its core challenges, which require domain-experts-like task scheduling and implementation capability.
We propose an autonomous agent equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval.
- Score: 17.962373755266068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence (AI) significantly influences many fields, largely thanks to the vast amounts of high-quality data for machine learning models. The emphasis is now on a data-centric AI strategy, prioritizing data development over model design progress. Automating this process is crucial. In this paper, we serve as the first work to introduce the automatic data-centric development (AD^2) task and outline its core challenges, which require domain-experts-like task scheduling and implementation capability, largely unexplored by previous work. By leveraging the strong complex problem-solving capabilities of large language models (LLMs), we propose an LLM-based autonomous agent, equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval (Co-STEER), to simultaneously address all the challenges. Specifically, our proposed Co-STEER agent enriches its domain knowledge through our proposed evolving strategy and develops both its scheduling and implementation skills by accumulating and retrieving domain-specific practical experience. With an improved schedule, the capability for implementation accelerates. Simultaneously, as implementation feedback becomes more thorough, the scheduling accuracy increases. These two capabilities evolve together through practical feedback, enabling a collaborative evolution process. Extensive experimental results demonstrate that our Co-STEER agent breaks new ground in AD^2 research, possesses strong evolvable schedule and implementation ability, and demonstrates the significant effectiveness of its components. Our Co-STEER paves the way for AD^2 advancements.
Related papers
- Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies [3.3374611485861116]
Large language model (LLM) based artificial intelligence technologies have been a game-changer, particularly in sentiment analysis.
However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges.
This study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems.
arXiv Detail & Related papers (2024-10-17T06:14:34Z) - Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting [36.31269406067809]
We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently.
We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
arXiv Detail & Related papers (2024-07-29T08:27:21Z) - Towards a RAG-based Summarization Agent for the Electron-Ion Collider [0.5504260452953508]
A Retrieval Augmented Generation (RAG)--based Summarization AI for EIC (RAGS4EIC) is under development.
This AI-Agent not only condenses information but also effectively references relevant responses, offering substantial advantages for collaborators.
Our project involves a two-step approach: first, querying a comprehensive vector database containing all pertinent experiment information; second, utilizing a Large Language Model (LLM) to generate concise summaries enriched with citations based on user queries and retrieved data.
arXiv Detail & Related papers (2024-03-23T05:32:46Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent
Self-Evolution [92.84441068115517]
Investigate-Consolidate-Exploit (ICE) is a novel strategy for enhancing the adaptability and flexibility of AI agents.
ICE promotes the transfer of knowledge between tasks for genuine self-evolution.
Our experiments on the XAgent framework demonstrate ICE's effectiveness, reducing API calls by as much as 80%.
arXiv Detail & Related papers (2024-01-25T07:47:49Z) - Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - On Realization of Intelligent Decision-Making in the Real World: A
Foundation Decision Model Perspective [54.38373782121503]
A Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks.
We present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks.
arXiv Detail & Related papers (2022-12-24T06:16:45Z) - Lean Evolutionary Reinforcement Learning by Multitasking with Importance
Sampling [20.9680985132322]
We introduce a novel neuroevolutionary multitasking (NuEMT) algorithm to transfer information from a set of auxiliary tasks to the target (full length) RL task.
We demonstrate that the NuEMT algorithm data-lean evolutionary RL, reducing expensive agent-environment interaction data requirements.
arXiv Detail & Related papers (2022-03-21T10:06:16Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.