Agents of Discovery
- URL: http://arxiv.org/abs/2509.08535v1
- Date: Wed, 10 Sep 2025 12:25:13 GMT
- Title: Agents of Discovery
- Authors: Sascha Diefenbacher, Anna Hallin, Gregor Kasieczka, Michael Krämer, Anne Lauscher, Tim Lukas,
- Abstract summary: Large language models (LLMs) can jointly solve data analysis-based research problems.<n>We consider the task of anomaly detection via the publicly available and highly-studied LHC Olympics dataset.<n>The best agent-created solutions mirror the performance of human state-of-the-art results.
- Score: 17.016402322416994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The substantial data volumes encountered in modern particle physics and other domains of fundamental physics research allow (and require) the use of increasingly complex data analysis tools and workflows. While the use of machine learning (ML) tools for data analysis has recently proliferated, these tools are typically special-purpose algorithms that rely, for example, on encoded physics knowledge to reach optimal performance. In this work, we investigate a new and orthogonal direction: Using recent progress in large language models (LLMs) to create a team of agents -- instances of LLMs with specific subtasks -- that jointly solve data analysis-based research problems in a way similar to how a human researcher might: by creating code to operate standard tools and libraries (including ML systems) and by building on results of previous iterations. If successful, such agent-based systems could be deployed to automate routine analysis components to counteract the increasing complexity of modern tool chains. To investigate the capabilities of current-generation commercial LLMs, we consider the task of anomaly detection via the publicly available and highly-studied LHC Olympics dataset. Several current models by OpenAI (GPT-4o, o4-mini, GPT-4.1, and GPT-5) are investigated and their stability tested. Overall, we observe the capacity of the agent-based system to solve this data analysis problem. The best agent-created solutions mirror the performance of human state-of-the-art results.
Related papers
- An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code [6.068120728706316]
Large language models (LLMs) offer a promising solution by generating code from natural language descriptions.<n>We construct a benchmark suite of domain-inspired prompts that reflect real-world research tasks.<n>Our findings show that, without human intervention, the reliability of LLM-generated code is limited.
arXiv Detail & Related papers (2025-11-26T21:27:03Z) - LLM/Agent-as-Data-Analyst: A Survey [54.08761322298559]
Large language models (LLMs) and agent techniques have brought a fundamental shift in the functionality and development paradigm of data analysis tasks.<n>LLMs enable complex data understanding, natural language, semantic analysis functions, and autonomous pipeline orchestration.
arXiv Detail & Related papers (2025-09-28T17:31:38Z) - Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance [1.2749527861829046]
Our framework integrates Knowledge Graphs (KGs) and Large Language Model (LLM)-based agents.<n>It transforms raw DES data into a semantically rich KG, capturing relationships between simulation events and entities.<n>An LLM-based agent uses iterative reasoning, generating interdependent sub-questions. For each sub-question, it creates Cypher queries for KG interaction, extracts information, and self-reflects to correct errors.
arXiv Detail & Related papers (2025-07-23T07:18:55Z) - MLGym: A New Framework and Benchmark for Advancing AI Research Agents [51.9387884953294]
We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing large language models on AI research tasks.<n>This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents.<n>We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro.
arXiv Detail & Related papers (2025-02-20T12:28:23Z) - Are Large Language Models Useful for Time Series Data Analysis? [3.44393516559102]
Time series data plays a critical role across diverse domains such as healthcare, energy, and finance.<n>This study investigates whether large language models (LLMs) are effective for time series data analysis.
arXiv Detail & Related papers (2024-12-16T02:47:44Z) - Evaluating Language Models as Synthetic Data Generators [99.16334775127875]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities.<n>Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z) - LAMBDA: A Large Model Based Data Agent [7.240586338370509]
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system.<n>LAMBDA is designed to address data analysis challenges in data-driven applications.<n>It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence.
arXiv Detail & Related papers (2024-07-24T06:26:36Z) - Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models [1.204452887718077]
We show how data management tools can significantly improve the quality of data that is used for machine learning (ML) applications.
We propose an architecture and implementation of such tools and demonstrate through two use cases how they can be used to improve ML-based eScience investigations.
arXiv Detail & Related papers (2024-06-27T04:42:29Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted
Approach for Qualitative Data Analysis [6.592797748561459]
Large Language Models (LLMs) have enabled collaborative human-bot interactions in Software Engineering (SE)
We introduce a new dimension of scalability and accuracy in qualitative research, potentially transforming data interpretation methodologies in SE.
arXiv Detail & Related papers (2024-02-02T13:10:46Z) - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [96.71370747681078]
We introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM.
For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs.
We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate.
arXiv Detail & Related papers (2023-10-05T04:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.