MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models
- URL: http://arxiv.org/abs/2601.03436v1
- Date: Tue, 06 Jan 2026 21:47:22 GMT
- Title: MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models
- Authors: Nikhil Mukund, Yifang Luo, Fan Zhang, Lisa Barsotti, Erik Katsavounidis,
- Abstract summary: We present MARVEL, a framework for domain-aware question answering and assisted scientific research.<n>MARVEL combines a fast path for straightforward queries with a more deliberate DeepSearch mode that integrates retrieval-augmented generation and Monte Carlo Tree Search.<n>We applied this framework in the context of gravitational-wave research related to the Laser Interferometer Gravitational-wave Observatory.
- Score: 2.0725712989738994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MARVEL (https://ligogpt.mit.edu/marvel), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. It is designed to address the increasing demands of a digital assistant for scientific groups that can read highly technical data, cite precisely, and operate within authenticated networks. MARVEL combines a fast path for straightforward queries with a more deliberate DeepSearch mode that integrates retrieval-augmented generation and Monte Carlo Tree Search. It explores complementary subqueries, allocates more compute to promising branches, and maintains a global evidence ledger that preserves sources during drafting. We applied this framework in the context of gravitational-wave research related to the Laser Interferometer Gravitational-wave Observatory. Answers are grounded in a curated semantic index of research literature, doctoral theses, LIGO documents, and long-running detector electronic logbooks, with targeted web searches when appropriate. Because direct benchmarking against commercial LLMs cannot be performed on private data, we evaluated MARVEL on two publicly available surrogate datasets that capture comparable semantic and technical characteristics. On these benchmarks, MARVEL matches a GPT-4o mini baseline on literature-centric queries and substantially outperforms it on detector-operations content, where domain retrieval and guided reasoning are decisive. By making the complete framework and evaluation datasets openly available, we aim to provide a reproducible foundation for developing domain-specific scientific assistants.
Related papers
- HeurekaBench: A Benchmarking Framework for AI Co-scientist [2.206319727896241]
HeurekaBench is a framework to create benchmarks with exploratory, open-ended research questions for experimental datasets.<n>We instantiate the framework in single-cell biology to obtain sc-HeurekaBench benchmark and use it to compare state-of-the-art single-cell agents.<n>We find that the addition of a critic module can improve ill-formed responses for open-source LLM-based agents by up to 22% and close the gap with their closed-source counterparts.
arXiv Detail & Related papers (2026-01-04T22:16:42Z) - Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval [0.0]
RA-FSM is a GPT-based research assistant that wraps generation in a finite-state control loop: Relevance -> Confidence -> Knowledge.<n>The controller filters out-of-scope queries, scores answerability, decomposes questions, and triggers retrieval only when needed.<n>We implement the system for photonics and evaluate it on six task categories: analytical reasoning, numerical analysis, methodological critique, comparative synthesis, factual extraction, and application design.
arXiv Detail & Related papers (2025-09-25T21:35:46Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents [30.768405850755602]
DeepResearch Bench is a benchmark consisting of 100 PhD-level research tasks.<n> evaluating Deep Research Agents is inherently complex and labor-intensive.<n>We propose two novel methodologies that achieve strong alignment with human judgment.
arXiv Detail & Related papers (2025-06-13T13:17:32Z) - Harnessing Large Language Models for Scientific Novelty Detection [49.10608128661251]
We propose to harness large language models (LLMs) for scientific novelty detection (ND)<n>To capture idea conception, we propose to train a lightweight retriever by distilling the idea-level knowledge from LLMs.<n> Experiments show our method consistently outperforms others on the proposed benchmark datasets for idea retrieval and ND tasks.
arXiv Detail & Related papers (2025-05-30T14:08:13Z) - WebThinker: Empowering Large Reasoning Models with Deep Research Capability [109.8504165631888]
WebThinker is a deep research agent that empowers LRMs to autonomously search the web, navigate among web pages, and draft reports during the reasoning process.<n>It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.<n>Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [30.603079363363634]
This study introduces ResearchArena, a benchmark designed to evaluate large language models' capabilities in conducting academic surveys.<n>ResearchArena models the process in three stages: (1) information discovery, identifying relevant literature; (2) information selection, evaluating papers' relevance and impact; and (3) information organization.<n>To support these evaluations, we construct an offline environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - A Reliable Knowledge Processing Framework for Combustion Science using
Foundation Models [0.0]
The study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature.
The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy.
The framework consistently delivers accurate domain-specific responses with minimal human oversight.
arXiv Detail & Related papers (2023-12-31T17:15:25Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.