ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents
- URL: http://arxiv.org/abs/2406.10291v1
- Date: Thu, 13 Jun 2024 03:26:30 GMT
- Title: ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents
- Authors: Hao Kang, Chenyan Xiong,
- Abstract summary: Large language models (LLMs) have exhibited remarkable performance across various tasks in natural language processing.
We develop ResearchArena, a benchmark that measures LLM agents' ability to conduct academic surveys.
- Score: 21.17856299966841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have exhibited remarkable performance across various tasks in natural language processing. Nevertheless, challenges still arise when these tasks demand domain-specific expertise and advanced analytical skills, such as conducting research surveys on a designated topic. In this research, we develop ResearchArena, a benchmark that measures LLM agents' ability to conduct academic surveys, an initial step of academic research process. Specifically, we deconstructs the surveying process into three stages 1) information discovery: locating relevant papers, 2) information selection: assessing papers' importance to the topic, and 3) information organization: organizing papers into meaningful structures. In particular, we establish an offline environment comprising 12.0M full-text academic papers and 7.9K survey papers, which evaluates agents' ability to locate supporting materials for composing the survey on a topic, rank the located papers based on their impact, and organize these into a hierarchical knowledge mind-map. With this benchmark, we conduct preliminary evaluations of existing techniques and find that all LLM-based methods under-performing when compared to basic keyword-based retrieval techniques, highlighting substantial opportunities for future research.
Related papers
- SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.
We employ automatic and manual curation to create the dataset.
SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - SurveyAgent: A Conversational System for Personalized and Efficient Research Survey [50.04283471107001]
This paper introduces SurveyAgent, a novel conversational system designed to provide personalized and efficient research survey assistance to researchers.
SurveyAgent integrates three key modules: Knowledge Management for organizing papers, Recommendation for discovering relevant literature, and Query Answering for engaging with content on a deeper level.
Our evaluation demonstrates SurveyAgent's effectiveness in streamlining research activities, showcasing its capability to facilitate how researchers interact with scientific literature.
arXiv Detail & Related papers (2024-04-09T15:01:51Z) - Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead [12.324949480085424]
There is currently no existing survey that focuses on the utilization of Large Language Models for vulnerability detection and repair.
This review encompasses research work from leading SE, AI, and Security conferences and journals, covering 36 papers published at 21 distinct venues.
arXiv Detail & Related papers (2024-04-03T07:27:33Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Information extraction aims to extract structural knowledge from plain natural language texts.
generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
LLMs offer viable solutions for IE tasks based on a generative paradigm.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - Efficient Large Language Models: A Survey [45.39970635367852]
This survey provides a systematic and comprehensive review of efficient Large Language Models research.
We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics.
We have also created a GitHub repository where we organize the papers featured in this survey.
arXiv Detail & Related papers (2023-12-06T19:18:42Z) - If the Sources Could Talk: Evaluating Large Language Models for Research
Assistance in History [1.3325600043256554]
We show that by augmenting Large-Language Models with vector embeddings from highly specialized academic sources, a conversational methodology can be made accessible to historians and other researchers in the Humanities.
Compared to established search interfaces for digital catalogues, such as metadata and full-text search, we evaluate the richer conversational style of LLMs on the performance of two main types of tasks.
arXiv Detail & Related papers (2023-10-16T20:12:06Z) - Towards an Understanding of Large Language Models in Software
Engineering Tasks [32.09925582943177]
Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in tasks such as text generation and reasoning.
This paper is the first to comprehensively investigate and collate the research and products combining LLMs with software engineering.
We have collected related literature as extensively from seven mainstream databases, and selected 123 papers for analysis.
arXiv Detail & Related papers (2023-08-22T12:37:29Z) - Wizard of Search Engine: Access to Information Through Conversations
with Search Engines [58.53420685514819]
We make efforts to facilitate research on CIS from three aspects.
We formulate a pipeline for CIS with six sub-tasks: intent detection (ID), keyphrase extraction (KE), action prediction (AP), query selection (QS), passage selection (PS) and response generation (RG)
We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS.
arXiv Detail & Related papers (2021-05-18T06:35:36Z) - Conversations with Documents. An Exploration of Document-Centered
Assistance [55.60379539074692]
Document-centered assistance, for example, to help an individual quickly review a document, has seen less significant progress.
We present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario.
We present a set of initial machine learned models that show that (a) we can accurately detect document-centered questions, and (b) we can build reasonably accurate models for answering such questions.
arXiv Detail & Related papers (2020-01-27T17:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.