Related papers: "Can You Tell Me?": Designing Copilots to Support Human Judgement in Online Information Seeking

"Can You Tell Me?": Designing Copilots to Support Human Judgement in Online Information Seeking

URL: http://arxiv.org/abs/2601.11284v1
Date: Fri, 16 Jan 2026 13:33:54 GMT
Title: "Can You Tell Me?": Designing Copilots to Support Human Judgement in Online Information Seeking
Authors: Markus Bink, Marten Risius, Udo Kruschwitz, David Elsweiler,
Abstract summary: This paper introduces an LLM-based conversational copilot designed to scaffold information evaluation.<n>Our mixed-methods analysis reveals that users engaged deeply with the copilot, demonstrating metacognitive reflection.<n>The copilot did not significantly improve answer correctness or search engagement, largely due to a "time-on-chat vs. exploration" trade-off.
Score: 2.901725877154321
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI (GenAI) tools are transforming information seeking, but their fluent, authoritative responses risk overreliance and discourage independent verification and reasoning. Rather than replacing the cognitive work of users, GenAI systems should be designed to support and scaffold it. Therefore, this paper introduces an LLM-based conversational copilot designed to scaffold information evaluation rather than provide answers and foster digital literacy skills. In a pre-registered, randomised controlled trial (N=261) examining three interface conditions including a chat-based copilot, our mixed-methods analysis reveals that users engaged deeply with the copilot, demonstrating metacognitive reflection. However, the copilot did not significantly improve answer correctness or search engagement, largely due to a "time-on-chat vs. exploration" trade-off and users' bias toward positive information. Qualitative findings reveal tension between the copilot's Socratic approach and users' desire for efficiency. These results highlight both the promise and pitfalls of pedagogical copilots, and we outline design pathways to reconcile literacy goals with efficiency demands.

Related papers

That's Deprecated! Understanding, Detecting, and Steering Knowledge Conflicts in Language Models for Code Generation [55.78914774437411]
Large language models (LLMs) behave when faced with discrepancies between their parametric knowledge and conflicting information contained in a prompt.<n>We propose a domain-agnostic framework for constructing and interpreting such conflicts.<n>We show that activation-level steering can achieve up to a textbf12.6% improvement in steering success over a random baseline.
arXiv Detail & Related papers (2025-10-21T22:27:56Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
A Human Centric Requirements Engineering Framework for Assessing Github Copilot Output [0.0]
GitHub Copilot introduces new challenges in how these software tools address human needs.<n>I analyzed GitHub Copilot's interaction with users through its chat interface.<n>I established a human-centered requirements framework with clear metrics to evaluate these qualities.
arXiv Detail & Related papers (2025-08-05T21:33:23Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
A Qualitative Study of User Perception of M365 AI Copilot [11.684396657620981]
We present results from a six month trial of M365 Copilot conducted at our organisation in 2024.<n>The study explored user perceptions of M365 Copilot's effectiveness, productivity impact, evolving expectations, ethical concerns, and overall satisfaction.<n>While M365 Copilot demonstrated value in specific operational areas, its broader impact remained constrained by usability limitations and the need for human oversight.
arXiv Detail & Related papers (2025-03-22T06:11:10Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
Design and evaluation of AI copilots -- case studies of retail copilot templates [2.7274834772504954]
Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively.
arXiv Detail & Related papers (2024-06-17T17:31:33Z)
Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions [11.620351603683496]
GitHub's Copilot for Pull Requests (PRs) is a promising service aiming to automate various developer tasks related to PRs. In this study, we examine 18,256 PRs in which parts of the descriptions were crafted by generative AI. Our findings indicate that Copilot for PRs, though in its infancy, is seeing a marked uptick in adoption.
arXiv Detail & Related papers (2024-02-14T06:20:57Z)
A User-centered Security Evaluation of Copilot [12.350130201627186]
We evaluate GitHub's Copilot to better understand its strengths and weaknesses with respect to code security. We find that access to Copilot accompanies a more secure solution when tackling harder problems.
arXiv Detail & Related papers (2023-08-12T14:49:46Z)
Machine Learning Explanations to Prevent Overtrust in Fake News Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news. We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms. For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z)
IART: Intent-aware Response Ranking with Transformers in Information-seeking Conversation Systems [80.0781718687327]
We analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART" IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture.
arXiv Detail & Related papers (2020-02-03T05:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.