Related papers: LM Agents for Coordinating Multi-User Information Gathering

LM Agents for Coordinating Multi-User Information Gathering

URL: http://arxiv.org/abs/2502.12328v1
Date: Mon, 17 Feb 2025 21:19:45 GMT
Title: LM Agents for Coordinating Multi-User Information Gathering
Authors: Harsh Jhamtani, Jacob Andreas, Benjamin Van Durme,
Abstract summary: PeopleJoin is a benchmark for evaluating LM-mediated collaborative problem solving.<n>PeopleJoin comprises two evaluation domains: PeopleJoin-QA and PeopleJoin-DocCreation.
Score: 82.3543678605684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces PeopleJoin, a benchmark for evaluating LM-mediated collaborative problem solving. Given a user request, PeopleJoin agents must identify teammates who might be able to assist, converse with these teammates to gather information, and finally compile a useful answer or summary for the original user. PeopleJoin comprises two evaluation domains: PeopleJoin-QA, focused on questions about tabular data, and PeopleJoin-DocCreation, focused on document creation tasks. The two domains are adapted from existing NLP benchmarks for database question answering and multi-document summarization; here, however, the information needed to complete these tasks is distributed across synthetic ``organizations'' of 2--20 users, simulating natural multi-user collaboration scenarios. We implemented several popular LM agent architectures, evaluating their accuracy and efficiency at completing tasks, and highlight new research questions that can be studied using PeopleJoin.

Related papers

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories. We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers. We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers. Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z)
Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach [0.052617184697694476]
We develop a novel framework for synthetically generating controlled, multi-turn conversations between a user and AI assistant. Each conversation aims to solve a table-based reasoning question through collaborative effort. We employ a strong teacher LLM to verify the correctness of our synthetic conversations.
arXiv Detail & Related papers (2025-03-18T11:37:25Z)
Exploring Rewriting Approaches for Different Conversational Tasks [63.56404271441824]
The exact rewriting approach may often depend on the use case and application-specific tasks supported by the conversational assistant. We systematically investigate two different approaches, denoted as rewriting and fusion, on two fundamentally different generation tasks. Our results indicate that the specific rewriting or fusion approach highly depends on the underlying use case and generative task.
arXiv Detail & Related papers (2025-02-26T06:05:29Z)
SubData: A Python Library to Collect and Combine Datasets for Evaluating LLM Alignment on Downstream Tasks [4.04666623219944]
SubData is a Python library that offers researchers working on topics related to subjectivity in annotation tasks a convenient way of collecting, combining and using a range of suitable datasets.
arXiv Detail & Related papers (2024-12-21T21:40:31Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations [8.848859080368799]
Collaborative STORM lets users observe and steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals.
arXiv Detail & Related papers (2024-08-27T17:50:03Z)
HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent [6.764665650605542]
We introduce HR-Multiwoz, a fully-labeled dataset of 550 conversations spanning 10 HR domains. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. It provides a detailed recipe for the data generation procedure along with data analysis and human evaluations.
arXiv Detail & Related papers (2024-02-01T21:10:44Z)
An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback [9.461978375200102]
The proposed interface is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. The interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries.
arXiv Detail & Related papers (2023-11-19T04:42:24Z)
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT. We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z)
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains. Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration. We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)
INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions [47.90088587508672]
InSCIt is a dataset for Information-Seeking Conversations with mixed-initiative Interactions. It contains 4.7K user-agent turns from 805 human-human conversations. We report results of two systems based on state-of-the-art models of conversational knowledge identification and open-domain question answering.
arXiv Detail & Related papers (2022-07-02T06:18:12Z)
Conversations with Search Engines: SERP-based Conversational Response Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines. We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.