Related papers: Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews

Related papers

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents [19.015202590038996]
multimodal agents show promise in real-world tasks like web navigation and embodied intelligence.<n>Due to limitations in a lack of external feedback, these agents struggle with self-correction and generalization.<n>There is no clear on how to select reward models for agents.
arXiv Detail & Related papers (2025-06-26T13:36:12Z)
Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges [72.19809898215857]
We introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains.<n>These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports.<n>We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured, creative solutions, and generates well-grounded, creative solutions.
arXiv Detail & Related papers (2025-05-21T03:33:23Z)
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review [1.4929298667651645]
We present a comparison of benchmarks developed between 2019 and 2025 that evaluate large language models and autonomous AI agents. We propose a taxonomy of approximately 60 benchmarks that cover knowledge reasoning, mathematical problem-solving, code generation and software engineering, factual grounding and retrieval, domain-specific evaluations, multimodal and embodied tasks, task orchestration, and interactive assessments. We present real-world applications of autonomous AI agents in materials science, biomedical research, academic ideation, software engineering, synthetic data generation, mathematical problem-solving, geographic information systems, multimedia, healthcare, and finance.
arXiv Detail & Related papers (2025-04-28T11:08:22Z)
A Multi-Agent Perspective on Modern Information Retrieval [12.228832858396368]
The rise of large language models (LLMs) has introduced a new era in information retrieval (IR) This shift challenges some long-standing IR paradigms and calls for a reassessment of both theoretical frameworks and practical methodologies. We advocate for a multi-agent perspective to better capture the complex interactions between query agents, document agents, and ranker agents.
arXiv Detail & Related papers (2025-02-20T18:17:26Z)
GUI Agents with Foundation Models: A Comprehensive Survey [52.991688542729385]
This survey consolidates recent research on (M)LLM-based GUI agents. We highlight key innovations in data, frameworks, and applications. We hope this paper will inspire further developments in the field of (M)LLM-based GUI agents.
arXiv Detail & Related papers (2024-11-07T17:28:10Z)
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments [90.29937153770835]
We introduce CRMArena, a benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments. We show that state-of-the-art LLM agents succeed in less than 40% of the tasks with ReAct prompting, and less than 55% even with function-calling abilities. Our findings highlight the need for enhanced agent capabilities in function-calling and rule-following to be deployed in real-world work environments.
arXiv Detail & Related papers (2024-11-04T17:30:51Z)
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. First, we collect real-world human activities to generate proactive task predictions. These predictions are labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z)
Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach [1.8874331450711404]
We propose a conceptual framework that combines modeling event logs, intelligent modeling assistants, and the generation of modeling operations. In particular, the architecture comprises modeling components that help the designer specify the system, record its operation within a graphical modeling environment, and automatically recommend relevant operations.
arXiv Detail & Related papers (2024-08-26T13:26:44Z)
Private Agent-Based Modeling [13.072333113108531]
The utility of agent-based models in decision-making relies on their capacity to accurately replicate populations. Yet, the incorporation of such data poses significant challenges due to privacy concerns. We introduce a paradigm for private agent-based modeling wherein the simulation, calibration, and analysis of agent-based models can be achieved without centralizing the agents attributes or interactions.
arXiv Detail & Related papers (2024-04-19T16:30:40Z)
An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z)
TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System [14.019244136838017]
TrainerAgent is a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development.
arXiv Detail & Related papers (2023-11-11T17:39:24Z)
FREIDA: A Framework for developing quantitative agent based models based on qualitative expert knowledge [0.0]
Agent Based Models (ABMs) often deal with systems where there is a lack of quantitative data or where quantitative data alone may be insufficient to fully capture the complexities of real-world systems.<n>Expert knowledge and qualitative insights are critical in constructing realistic behavioral rules, interactions, and decision-making processes within these models.<n>We propose FREIDA, a systematic mixed-methods framework to develop, train, and validate ABMs.
arXiv Detail & Related papers (2023-07-21T11:26:54Z)
Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents [0.0]
We present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems. Our framework introduces a collaborative environment where multiple intelligent agent components, each with distinctive attributes and roles, work together to handle complex tasks more efficiently and effectively.
arXiv Detail & Related papers (2023-06-05T23:55:37Z)
Phantom -- A RL-driven multi-agent framework to model complex systems [1.0499611180329804]
Phantom is an RL-driven framework for agent-based modelling of complex multi-agent systems. It aims to provide the tools to simplify the ABM specification in a MARL-compatible way. We present these features, their design rationale and present two new environments leveraging the framework.
arXiv Detail & Related papers (2022-10-12T08:37:38Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z)
Quantitatively Assessing the Benefits of Model-driven Development in Agent-based Modeling and Simulation [80.49040344355431]
This paper compares the use of MDD and ABMS platforms in terms of effort and developer mistakes. The obtained results show that MDD4ABMS requires less effort to develop simulations with similar (sometimes better) design quality than NetLogo.
arXiv Detail & Related papers (2020-06-15T23:29:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.