Related papers: MaRGen: Multi-Agent LLM Approach for Self-Directed Market Research and Analysis

MaRGen: Multi-Agent LLM Approach for Self-Directed Market Research and Analysis

URL: http://arxiv.org/abs/2508.01370v1
Date: Sat, 02 Aug 2025 13:49:15 GMT
Title: MaRGen: Multi-Agent LLM Approach for Self-Directed Market Research and Analysis
Authors: Roman Koshkin, Pengyu Dai, Nozomi Fujikawa, Masahito Togami, Marco Visentini-Scarzanella,
Abstract summary: We present an autonomous framework that automates end-to-end business analysis and market report generation.<n>At its core, the system employs specialized agents that collaborate to analyze data and produce comprehensive reports.<n>The framework executes a multi-step process: querying databases, analyzing data, generating insights, creating visualizations, and composing market reports.
Score: 20.59282767847679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present an autonomous framework that leverages Large Language Models (LLMs) to automate end-to-end business analysis and market report generation. At its core, the system employs specialized agents - Researcher, Reviewer, Writer, and Retriever - that collaborate to analyze data and produce comprehensive reports. These agents learn from real professional consultants' presentation materials at Amazon through in-context learning to replicate professional analytical methodologies. The framework executes a multi-step process: querying databases, analyzing data, generating insights, creating visualizations, and composing market reports. We also introduce a novel LLM-based evaluation system for assessing report quality, which shows alignment with expert human evaluations. Building on these evaluations, we implement an iterative improvement mechanism that optimizes report quality through automated review cycles. Experimental results show that report quality can be improved by both automated review cycles and consultants' unstructured knowledge. In experimental validation, our framework generates detailed 6-page reports in 7 minutes at a cost of approximately \$1. Our work could be an important step to automatically create affordable market insights.

Related papers

AI Analyst: Framework and Comprehensive Evaluation of Large Language Models for Financial Time Series Report Generation [16.88763856265673]
We introduce an automated highlighting system to categorize information within the generated reports.<n>Our experiments, utilizing both data from the real stock market indices and synthetic time series, demonstrate the capability of LLMs to produce coherent and informative financial reports.
arXiv Detail & Related papers (2025-07-01T12:57:18Z)
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics [2.7933239275667545]
We present an agentic system that automates the data-to-dashboard pipeline through modular LLM agents.<n>Unlike existing chart systems, our framework simulates the analytical reasoning process of business analysts.<n>Our approach shows improved insightfulness, domain relevance, and analytical depth, as measured by tailored evaluation metrics.
arXiv Detail & Related papers (2025-05-29T17:32:15Z)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.<n>Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.<n>We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z)
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z)
FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools [7.6993069412225905]
Current financial large language models (FinLLMs) struggle with two critical limitations.<n>The absence of objective evaluation metrics to assess the quality of stock analysis reports and a lack of depth in stock analysis impedes their ability to generate professional-grade insights.<n>This paper introduces FinSphere, a stock analysis agent, along with three major contributions.
arXiv Detail & Related papers (2025-01-08T07:50:50Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.<n>It consists of 100 datasets representing diverse business use cases such as finance and incident management.<n>Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models [31.426274932333264]
We present Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluation. The tool supports interactive for users to understand when and why a model performs better or worse than a baseline model.
arXiv Detail & Related papers (2024-02-16T09:14:49Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.