Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
- URL: http://arxiv.org/abs/2411.01114v1
- Date: Sat, 02 Nov 2024 02:48:37 GMT
- Title: Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
- Authors: Bin Lei, Yuchen Li, Yiming Zeng, Tao Ren, Yi Luo, Tianyu Shi, Zitian Gao, Zeyu Hu, Weitai Kang, Qiuwu Chen,
- Abstract summary: textscInfant Agent integrates task-aware functions, operators, a hierarchical management system, and a memory retrieval mechanism.
Using the textscInfant Agent, GPT-4o's accuracy on the SWE-bench-lite dataset rises from $mathbf0.33%$ to $mathbf30%$, and in the AIME-2024 mathematics competition, it increases GPT-4o's accuracy from $mathbf13.3%$ to $mathbf37%$.
- Score: 19.54437582630868
- License:
- Abstract: Despite the impressive capabilities of large language models (LLMs), they currently exhibit two primary limitations, \textbf{\uppercase\expandafter{\romannumeral 1}}: They struggle to \textbf{autonomously solve the real world engineering problem}. \textbf{\uppercase\expandafter{\romannumeral 2}}: They remain \textbf{challenged in reasoning through complex logic problems}. To address these challenges, we developed the \textsc{Infant Agent}, integrating task-aware functions, operators, a hierarchical management system, and a memory retrieval mechanism. Together, these components enable large language models to sustain extended reasoning processes and handle complex, multi-step tasks efficiently, all while significantly reducing API costs. Using the \textsc{Infant Agent}, GPT-4o's accuracy on the SWE-bench-lite dataset rises from $\mathbf{0.33\%}$ to $\mathbf{30\%}$, and in the AIME-2024 mathematics competition, it increases GPT-4o's accuracy from $\mathbf{13.3\%}$ to $\mathbf{37\%}$.
Related papers
- FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Vision Transformer with Sparse Scan Prior [57.37893387775829]
Inspired by the human eye's sparse scanning mechanism, we propose a textbfSparse textbfScan textbfSelf-textbfAttention mechanism.
This mechanism predefines a series of Anchors of Interest for each token and employs local attention to efficiently model the spatial information around these anchors.
Building on $rmS3rmA$, we introduce the textbfSparse textbfScan textbfVision
arXiv Detail & Related papers (2024-05-22T04:34:36Z) - MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems [10.517708404982624]
This paper introduces the textitMulti-Agent System for conditional Mining (textbfMACM) prompting method.
It resolves intricate mathematical problems and demonstrates strong generalization capabilities across various mathematical contexts.
With the assistance of MACM, the accuracy of GPT-4 Turbo on the most challenging level five mathematical problems in the MATH dataset increase from $mathbf54.68% text to mathbf76.73%$.
arXiv Detail & Related papers (2024-04-06T21:39:01Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Modeling Complex Mathematical Reasoning via Large Language Model based
MathAgent [15.81048994298046]
Large language models (LLMs) face challenges in solving complex mathematical problems.
We propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework.
Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents.
arXiv Detail & Related papers (2023-12-14T13:33:50Z) - Blocked Collaborative Bandits: Online Collaborative Filtering with
Per-Item Budget Constraints [46.65419724935037]
We consider the problem of emphblocked collaborative bandits where there are multiple users, each with an associated multi-armed bandit problem.
Our goal is to design algorithms that maximize the cumulative reward accrued by all the users over time.
textttB-LATTICE achieves a per-user regret of $widetildeO(sqrtmathsfT(sqrtmathsfNmathsfM-1)$ under a budget constraint.
arXiv Detail & Related papers (2023-10-31T11:04:21Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - Boosting Logical Reasoning in Large Language Models through a New
Framework: The Graph of Thought [7.356034193515096]
Our paper unveils a pioneering prompting technique, dubbed textitGraph of Thoughts (GoT).
Our method outperformed GPT-4, achieving accuracy improvements of $89.7%$, $86%$, and $56%$ for each respective task.
When juxtaposed with the state-of-the-art prompting method, textitTree of Thought (ToT), our approach registered an average accuracy boost of $23%$, $24%$, and $15%$.
arXiv Detail & Related papers (2023-08-16T18:13:27Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.