Memory-Augmented Agent Training for Business Document Understanding
- URL: http://arxiv.org/abs/2412.15274v1
- Date: Tue, 17 Dec 2024 18:35:04 GMT
- Title: Memory-Augmented Agent Training for Business Document Understanding
- Authors: Jiale Liu, Yifan Zeng, Malte HĂžjmark-Bertelsen, Marie Normann Gadeberg, Huazheng Wang, Qingyun Wu,
- Abstract summary: We introduce Matrix (Memory-Augmented agent Training through Reasoning and Iterative eXploration), a novel paradigm that enables LLM agents to progressively build domain expertise.
We collaborate with one of the world's largest logistics companies to create a dataset of Universal Business Language format invoice documents.
Experiments demonstrate that Matrix outperforms prompting a single LLM by 30.3%, vanilla LLM agent by 35.2%.
- Score: 16.143076522786803
- License:
- Abstract: Traditional enterprises face significant challenges in processing business documents, where tasks like extracting transport references from invoices remain largely manual despite their crucial role in logistics operations. While Large Language Models offer potential automation, their direct application to specialized business domains often yields unsatisfactory results. We introduce Matrix (Memory-Augmented agent Training through Reasoning and Iterative eXploration), a novel paradigm that enables LLM agents to progressively build domain expertise through experience-driven memory refinement and iterative learning. To validate this approach, we collaborate with one of the world's largest logistics companies to create a dataset of Universal Business Language format invoice documents, focusing on the task of transport reference extraction. Experiments demonstrate that Matrix outperforms prompting a single LLM by 30.3%, vanilla LLM agent by 35.2%. We further analyze the metrics of the optimized systems and observe that the agent system requires less API calls, fewer costs and can analyze longer documents on average. Our methods establish a new approach to transform general-purpose LLMs into specialized business tools through systematic memory enhancement in document processing tasks.
Related papers
- Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests.
We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments.
We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications [1.1682259692399921]
Long-context Large Language Models (LLMs) can grasp extensive connections, provide cohesive summaries, and adapt to various industry domains.
Case studies show notable enhancements in both efficiency and accuracy.
arXiv Detail & Related papers (2024-09-27T05:29:31Z) - Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5 [0.0]
We present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5.
Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios.
arXiv Detail & Related papers (2024-09-17T15:37:56Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - Using Large Language Models to Understand Telecom Standards [35.343893798039765]
Large Language Models (LLMs) may provide faster access to relevant information.
We evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants.
Results show that LLMs can be used as a credible reference tool on telecom technical documents.
arXiv Detail & Related papers (2024-04-02T09:54:51Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and
Character Design [11.913409501633616]
textscFinMem is a novel LLM-based agent framework devised for financial decision-making.
textscFinMem's memory module aligns closely with the cognitive structure of human traders, offering robust interpretability.
This framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions.
arXiv Detail & Related papers (2023-11-23T00:24:40Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - Multimodal Document Analytics for Banking Process Automation [4.541582055558865]
The paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business.
It offers practical guidance on how to unlock this potential in day-to-day operations.
arXiv Detail & Related papers (2023-07-21T18:29:04Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.