ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents
- URL: http://arxiv.org/abs/2406.10291v1
- Date: Thu, 13 Jun 2024 03:26:30 GMT
- Title: ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents
- Authors: Hao Kang, Chenyan Xiong,
- Abstract summary: Large language models (LLMs) have exhibited remarkable performance across various tasks in natural language processing.
We develop ResearchArena, a benchmark that measures LLM agents' ability to conduct academic surveys.
- Score: 21.17856299966841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have exhibited remarkable performance across various tasks in natural language processing. Nevertheless, challenges still arise when these tasks demand domain-specific expertise and advanced analytical skills, such as conducting research surveys on a designated topic. In this research, we develop ResearchArena, a benchmark that measures LLM agents' ability to conduct academic surveys, an initial step of academic research process. Specifically, we deconstructs the surveying process into three stages 1) information discovery: locating relevant papers, 2) information selection: assessing papers' importance to the topic, and 3) information organization: organizing papers into meaningful structures. In particular, we establish an offline environment comprising 12.0M full-text academic papers and 7.9K survey papers, which evaluates agents' ability to locate supporting materials for composing the survey on a topic, rank the located papers based on their impact, and organize these into a hierarchical knowledge mind-map. With this benchmark, we conduct preliminary evaluations of existing techniques and find that all LLM-based methods under-performing when compared to basic keyword-based retrieval techniques, highlighting substantial opportunities for future research.
Related papers
- AAAR-1.0: Assessing AI's Potential to Assist Research [34.88341605349765]
We introduce AAAR-1.0, a benchmark dataset designed to evaluate large language models (LLMs) performance in three fundamental, expertise-intensive research tasks.
AAAR-1.0 differs from prior benchmarks in two key ways: first, it is explicitly research-oriented, with tasks requiring deep domain expertise; second, it is researcher-oriented, mirroring the primary activities that researchers engage in on a daily basis.
arXiv Detail & Related papers (2024-10-29T17:58:29Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - What is the Role of Large Language Models in the Evolution of Astronomy Research? [0.0]
ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields.
These models, commonly trained on vast datasets, exhibit human-like text generation capabilities.
arXiv Detail & Related papers (2024-09-30T12:42:25Z) - Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature [48.572336666741194]
We present Knowledge Navigator, a system designed to enhance exploratory search abilities.
It organizes retrieved documents into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics.
arXiv Detail & Related papers (2024-08-28T14:48:37Z) - Reconciling Methodological Paradigms: Employing Large Language Models as Novice Qualitative Research Assistants in Talent Management Research [1.0949553365997655]
This study proposes a novel approach by leveraging Retrieval Augmented Generation (RAG) based Large Language Models (LLMs) for analyzing interview transcripts.
The novelty of this work lies in strategizing the research inquiry as one that is augmented by an LLM that serves as a novice research assistant.
Our findings demonstrate that the LLM-augmented RAG approach can successfully extract topics of interest, with significant coverage compared to manually generated topics.
arXiv Detail & Related papers (2024-08-20T17:49:51Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - SurveyAgent: A Conversational System for Personalized and Efficient Research Survey [50.04283471107001]
This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers.
SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level.
Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature.
arXiv Detail & Related papers (2024-04-09T15:01:51Z) - Efficient Large Language Models: A Survey [45.39970635367852]
This survey provides a systematic and comprehensive review of efficient Large Language Models research.
We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics.
We have also created a GitHub repository where we organize the papers featured in this survey.
arXiv Detail & Related papers (2023-12-06T19:18:42Z) - If the Sources Could Talk: Evaluating Large Language Models for Research
Assistance in History [1.3325600043256554]
We show that by augmenting Large-Language Models with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities.
Compared to established search interfaces for digital catalogues, such as metadata and full-text search, we evaluate the richer conversational style of LLMs on the performance of two main types of tasks.
arXiv Detail & Related papers (2023-10-16T20:12:06Z) - Wizard of Search Engine: Access to Information Through Conversations
with Search Engines [58.53420685514819]
We make efforts to facilitate research on CIS from three aspects.
We formulate a pipeline for CIS with six sub-tasks: intent detection (ID), keyphrase extraction (KE), action prediction (AP), query selection (QS), passage selection (PS) and response generation (RG)
We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS.
arXiv Detail & Related papers (2021-05-18T06:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.