Related papers: Supporting Sensemaking of Large Language Model Outputs at Scale

Supporting Sensemaking of Large Language Model Outputs at Scale

URL: http://arxiv.org/abs/2401.13726v1
Date: Wed, 24 Jan 2024 18:45:34 GMT
Title: Supporting Sensemaking of Large Language Model Outputs at Scale
Authors: Katy Ilonka Gero, Chelse Swoopes, Ziwei Gu, Jonathan K. Kummerfeld, Elena L. Glassman
Abstract summary: Large language models (LLMs) are capable of generating multiple responses to a single prompt. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual documents. We find that the features support a wide variety of sensemaking tasks and even make tasks previously considered to be too difficult by our participants now tractable.
Score: 21.763460834412776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are capable of generating multiple responses to a single prompt, yet little effort has been expended to help end-users or system designers make use of this capability. In this paper, we explore how to present many LLM responses at once. We design five features, which include both pre-existing and novel methods for computing similarities and differences across textual documents, as well as how to render their outputs. We report on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. We find that the features support a wide variety of sensemaking tasks and even make tasks previously considered to be too difficult by our participants now tractable. Finally, we present design guidelines to inform future explorations of new LLM interfaces.

Related papers

Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development [5.559706293891474]
Large language model (LLM)-powered chatbots have become popular across various domains, supporting a range of tasks and processes. Yet, prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples.
arXiv Detail & Related papers (2025-03-04T11:56:33Z)
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Models [22.676688441884465]
Fine-tuning pre-trained large language models (LLMs) on a diverse array of tasks has become a common approach for building models. This study investigates the task-specific information encoded in pre-trained LLMs and the effects of instruction tuning on their representations.
arXiv Detail & Related papers (2024-10-25T23:38:28Z)
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs [61.56904387052982]
This paper proposes a new visual grounding task called multi-context visual grounding.<n>It aims to localize instances of interest across multiple images based on open-ended text prompts.<n>We benchmark over 20 state-of-the-art MLLMs and foundation models with potential multi-context visual grounding capabilities.
arXiv Detail & Related papers (2024-10-16T07:52:57Z)
A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks [0.0]
Large language models (LLMs) have shown remarkable performance on many different Natural Language Processing (NLP) tasks. Prompt engineering plays a key role in adding more to the already existing abilities of LLMs to achieve significant performance gains. This paper summarizes different prompting techniques and club them together based on different NLP tasks that they have been used for.
arXiv Detail & Related papers (2024-07-17T20:23:19Z)
Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z)
TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning [37.09785060896196]
We propose textbfTEGEE (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions. Our framework employs a dual 3B model approach, with each model assigned a distinct role. Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model.
arXiv Detail & Related papers (2024-03-07T05:26:41Z)
Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task. We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z)
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark [11.572835837392867]
This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset. It consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation.
arXiv Detail & Related papers (2023-11-21T23:26:05Z)
SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z)
Frugal Prompting for Dialog Models [17.048111072193933]
This study examines different approaches for building dialog systems using large language models (LLMs) As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context. The research also analyzes the representations of dialog history that have the optimal usable-information density.
arXiv Detail & Related papers (2023-05-24T09:06:49Z)
RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit. Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets. Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance [60.40541387785977]
Small foundational models can display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data. In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following. Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks.
arXiv Detail & Related papers (2023-05-22T16:56:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.