Related papers: Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning

Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning

URL: http://arxiv.org/abs/2312.06820v1
Date: Mon, 11 Dec 2023 20:12:46 GMT
Title: Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning
Authors: Sara Abdali, Anjali Parikh, Steve Lim, Emre Kiciman
Abstract summary: Microsoft Windows Feedback Hub is designed to receive customer feedback on a wide variety of subjects including critical topics such as power and battery. To better understand and triage issues, we leverage Double Machine Learning (DML) to associate users' feedback with telemetry signals. Our approach is able to extract previously known issues, uncover new bugs, and identify sequences of events that lead to a bug.
Score: 11.609805521822878
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Microsoft Windows Feedback Hub is designed to receive customer feedback on a wide variety of subjects including critical topics such as power and battery. Feedback is one of the most effective ways to have a grasp of users' experience with Windows and its ecosystem. However, the sheer volume of feedback received by Feedback Hub makes it immensely challenging to diagnose the actual cause of reported issues. To better understand and triage issues, we leverage Double Machine Learning (DML) to associate users' feedback with telemetry signals. One of the main challenges we face in the DML pipeline is the necessity of domain knowledge for model design (e.g., causal graph), which sometimes is either not available or hard to obtain. In this work, we take advantage of reasoning capabilities in Large Language Models (LLMs) to generate a prior model that which to some extent compensates for the lack of domain knowledge and could be used as a heuristic for measuring feedback informativeness. Our LLM-based approach is able to extract previously known issues, uncover new bugs, and identify sequences of events that lead to a bug, while minimizing out-of-domain outputs.

Related papers

You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? [0.4779196219827508]
This paper aims to generate specific types of feedback for programming tasks using Large Language Models (LLMs) We revisit existing feedback to capture the specifics of the generated feedback, such as randomness, uncertainty, and degrees of variation. Results have implications for future feedback research with regard to, for example, feedback effects and learners' informational needs.
arXiv Detail & Related papers (2024-12-04T17:57:39Z)
LFOSum: Summarizing Long-form Opinions with Large Language Models [7.839083566878183]
This paper introduces (1) a new dataset of long-form user reviews, each entity comprising over a thousand reviews, (2) two training-free LLM-based summarization approaches that scale to long inputs, and (3) automatic evaluation metrics. Our dataset of user reviews is paired with in-depth and unbiased critical summaries by domain experts, serving as a reference for evaluation. Our evaluation reveals that LLMs still face challenges in balancing sentiment and format adherence in long-form summaries, though open-source models can narrow the gap when relevant information is retrieved in a focused manner.
arXiv Detail & Related papers (2024-10-16T20:52:39Z)
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [36.06000681394939]
We introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with large language models (LLMs) to create preference datasets automatically. Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z)
On the Automated Processing of User Feedback [7.229732269884235]
User feedback is an increasingly important source of information for requirements engineering, user interface design, and software engineering. To tap the full potential of feedback, there are two main challenges that need to be solved. Vendors must cope with a large quantity of feedback data, which is hard to manage manually. Second, vendors must also cope with a varying quality of feedback as some items might be uninformative, repetitive, or simply wrong.
arXiv Detail & Related papers (2024-07-22T10:13:13Z)
Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? [3.7399138244928145]
We study the capabilities of large language models to generate feedback for open-ended math questions. We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors.
arXiv Detail & Related papers (2024-05-10T11:53:53Z)
Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process. We conduct two case studies of this approach for improving search query and dialog response generation. We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools. Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
Simulating Bandit Learning from User Feedback for Extractive Question Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.