Extracting Self-Consistent Causal Insights from Users Feedback with LLMs
and In-context Learning
- URL: http://arxiv.org/abs/2312.06820v1
- Date: Mon, 11 Dec 2023 20:12:46 GMT
- Title: Extracting Self-Consistent Causal Insights from Users Feedback with LLMs
and In-context Learning
- Authors: Sara Abdali, Anjali Parikh, Steve Lim, Emre Kiciman
- Abstract summary: Microsoft Windows Feedback Hub is designed to receive customer feedback on a wide variety of subjects including critical topics such as power and battery.
To better understand and triage issues, we leverage Double Machine Learning (DML) to associate users' feedback with telemetry signals.
Our approach is able to extract previously known issues, uncover new bugs, and identify sequences of events that lead to a bug.
- Score: 11.609805521822878
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Microsoft Windows Feedback Hub is designed to receive customer feedback on a
wide variety of subjects including critical topics such as power and battery.
Feedback is one of the most effective ways to have a grasp of users' experience
with Windows and its ecosystem. However, the sheer volume of feedback received
by Feedback Hub makes it immensely challenging to diagnose the actual cause of
reported issues. To better understand and triage issues, we leverage Double
Machine Learning (DML) to associate users' feedback with telemetry signals. One
of the main challenges we face in the DML pipeline is the necessity of domain
knowledge for model design (e.g., causal graph), which sometimes is either not
available or hard to obtain. In this work, we take advantage of reasoning
capabilities in Large Language Models (LLMs) to generate a prior model that
which to some extent compensates for the lack of domain knowledge and could be
used as a heuristic for measuring feedback informativeness. Our LLM-based
approach is able to extract previously known issues, uncover new bugs, and
identify sequences of events that lead to a bug, while minimizing out-of-domain
outputs.
Related papers
- LFOSum: Summarizing Long-form Opinions with Large Language Models [7.839083566878183]
This paper introduces (1) a new dataset of long-form user reviews, each entity comprising over a thousand reviews, (2) two training-free LLM-based summarization approaches that scale to long inputs, and (3) automatic evaluation metrics.
Our dataset of user reviews is paired with in-depth and unbiased critical summaries by domain experts, serving as a reference for evaluation.
Our evaluation reveals that LLMs still face challenges in balancing sentiment and format adherence in long-form summaries, though open-source models can narrow the gap when relevant information is retrieved in a focused manner.
arXiv Detail & Related papers (2024-10-16T20:52:39Z) - On the Automated Processing of User Feedback [7.229732269884235]
User feedback is an increasingly important source of information for requirements engineering, user interface design, and software engineering.
To tap the full potential of feedback, there are two main challenges that need to be solved.
Vendors must cope with a large quantity of feedback data, which is hard to manage manually.
Second, vendors must also cope with a varying quality of feedback as some items might be uninformative, repetitive, or simply wrong.
arXiv Detail & Related papers (2024-07-22T10:13:13Z) - Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? [3.7399138244928145]
We study the capabilities of large language models to generate feedback for open-ended math questions.
We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors.
arXiv Detail & Related papers (2024-05-10T11:53:53Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process.
We conduct two case studies of this approach for improving search query and dialog response generation.
We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools.
Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.