LM Agents for Coordinating Multi-User Information Gathering
- URL: http://arxiv.org/abs/2502.12328v1
- Date: Mon, 17 Feb 2025 21:19:45 GMT
- Title: LM Agents for Coordinating Multi-User Information Gathering
- Authors: Harsh Jhamtani, Jacob Andreas, Benjamin Van Durme,
- Abstract summary: PeopleJoin is a benchmark for evaluating LM-mediated collaborative problem solving.
PeopleJoin comprises two evaluation domains: PeopleJoin-QA and PeopleJoin-DocCreation.
- Score: 82.3543678605684
- License:
- Abstract: This paper introduces PeopleJoin, a benchmark for evaluating LM-mediated collaborative problem solving. Given a user request, PeopleJoin agents must identify teammates who might be able to assist, converse with these teammates to gather information, and finally compile a useful answer or summary for the original user. PeopleJoin comprises two evaluation domains: PeopleJoin-QA, focused on questions about tabular data, and PeopleJoin-DocCreation, focused on document creation tasks. The two domains are adapted from existing NLP benchmarks for database question answering and multi-document summarization; here, however, the information needed to complete these tasks is distributed across synthetic ``organizations'' of 2--20 users, simulating natural multi-user collaboration scenarios. We implemented several popular LM agent architectures, evaluating their accuracy and efficiency at completing tasks, and highlight new research questions that can be studied using PeopleJoin.
Related papers
- SubData: A Python Library to Collect and Combine Datasets for Evaluating LLM Alignment on Downstream Tasks [4.04666623219944]
SubData is a Python library that offers researchers working on topics related to subjectivity in annotation tasks a convenient way of collecting, combining and using a range of suitable datasets.
arXiv Detail & Related papers (2024-12-21T21:40:31Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations [8.848859080368799]
Collaborative STORM lets users observe and steer the discourse among several LM agents.
The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously.
For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals.
arXiv Detail & Related papers (2024-08-27T17:50:03Z) - HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent [6.764665650605542]
We introduce HR-Multiwoz, a fully-labeled dataset of 550 conversations spanning 10 HR domains.
It is the first labeled open-sourced conversation dataset in the HR domain for NLP research.
It provides a detailed recipe for the data generation procedure along with data analysis and human evaluations.
arXiv Detail & Related papers (2024-02-01T21:10:44Z) - An Interactive Query Generation Assistant using LLM-based Prompt
Modification and User Feedback [9.461978375200102]
The proposed interface is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection.
The interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries.
arXiv Detail & Related papers (2023-11-19T04:42:24Z) - On Evaluating the Integration of Reasoning and Action in LLM Agents with
Database Question Answering [25.57202500348071]
This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models interact with a database.
The task requires LLMs to strategically generate multiplesql queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative.
We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction.
arXiv Detail & Related papers (2023-11-16T09:55:07Z) - The Shifted and The Overlooked: A Task-oriented Investigation of
User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT.
We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z) - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains.
Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration.
We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z) - INSCIT: Information-Seeking Conversations with Mixed-Initiative
Interactions [47.90088587508672]
InSCIt is a dataset for Information-Seeking Conversations with mixed-initiative Interactions.
It contains 4.7K user-agent turns from 805 human-human conversations.
We report results of two systems based on state-of-the-art models of conversational knowledge identification and open-domain question answering.
arXiv Detail & Related papers (2022-07-02T06:18:12Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.