Automating App Review Response Generation
- URL: http://arxiv.org/abs/2002.03552v1
- Date: Mon, 10 Feb 2020 05:23:38 GMT
- Title: Automating App Review Response Generation
- Authors: Cuiyun Gao, Jichuan Zeng, Xin Xia, David Lo, Michael R. Lyu, Irwin
King
- Abstract summary: We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses.
Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
- Score: 67.58267006314415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies showed that replying to a user review usually has a positive
effect on the rating that is given by the user to the app. For example, Hassan
et al. found that responding to a review increases the chances of a user
updating their given rating by up to six times compared to not responding. To
alleviate the labor burden in replying to the bulk of user reviews, developers
usually adopt a template-based strategy where the templates can express
appreciation for using the app or mention the company email address for users
to follow up. However, reading a large number of user reviews every day is not
an easy task for developers. Thus, there is a need for more automation to help
developers respond to user reviews.
Addressing the aforementioned need, in this work we propose a novel approach
RRGen that automatically generates review responses by learning knowledge
relations between reviews and their responses. RRGen explicitly incorporates
review attributes, such as user rating and review length, and learns the
relations between reviews and corresponding responses in a supervised way from
the available training data. Experiments on 58 apps and 309,246 review-response
pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms
of BLEU-4 (an accuracy measure that is widely used to evaluate dialogue
response generation systems). Qualitative analysis also confirms the
effectiveness of RRGen in generating relevant and accurate responses.
Related papers
- Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations [85.81295563405433]
Language model users often issue queries that lack specification, where the context under which a query was issued is not explicit.
We present contextualized evaluations, a protocol that synthetically constructs context surrounding an under-specified query and provides it during evaluation.
We find that the presence of context can 1) alter conclusions drawn from evaluation, even flipping win rates between model pairs, 2) nudge evaluators to make fewer judgments based on surface-level criteria, like style, and 3) provide new insights about model behavior across diverse contexts.
arXiv Detail & Related papers (2024-11-11T18:58:38Z) - Prompt Optimization with Human Feedback [69.95991134172282]
We study the problem of prompt optimization with human feedback (POHF)
We introduce our algorithm named automated POHF (APOHF)
The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances.
arXiv Detail & Related papers (2024-05-27T16:49:29Z) - Self-Improving Customer Review Response Generation Based on LLMs [1.9274286238176854]
SCRABLE represents an adaptive customer review response automation that enhances itself with self-optimizing prompts.
We introduce an automatic scoring mechanism that mimics the role of a human evaluator to assess the quality of responses generated in customer review domains.
arXiv Detail & Related papers (2024-05-06T20:50:17Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models [17.782410287625645]
This paper proposes a benchmark, RefuteBench, covering tasks such as question answering, machine translation, and email writing.
The evaluation aims to assess whether models can positively accept feedback in form of refuting instructions and whether they can consistently adhere to user demands throughout the conversation.
arXiv Detail & Related papers (2024-02-21T01:39:56Z) - Proactive Prioritization of App Issues via Contrastive Learning [2.6763498831034043]
We propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews.
PPrior employs a pre-trained T5 model and works in three phases.
Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion.
Phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews.
arXiv Detail & Related papers (2023-03-12T06:23:10Z) - Meaningful Answer Generation of E-Commerce Question-Answering [77.89755281215079]
In e-commerce portals, generating answers for product-related questions has become a crucial task.
In this paper, we propose a novel generative neural model, called the Meaningful Product Answer Generator (MPAG)
MPAG alleviates the safe answer problem by taking product reviews, product attributes, and a prototype answer into consideration.
arXiv Detail & Related papers (2020-11-14T14:05:30Z) - E-commerce Query-based Generation based on User Review [1.484852576248587]
We propose a novel seq2seq based text generation model to generate answers to user's question based on reviews posted by previous users.
Given a user question and/or target sentiment polarity, we extract aspects of interest and generate an answer that summarizes previous relevant user reviews.
arXiv Detail & Related papers (2020-11-11T04:58:31Z) - App-Aware Response Synthesis for User Reviews [7.466973484411213]
AAR Synth is an app-aware response synthesis system.
It retrieves the top-K most relevant app reviews and the most relevant snippet from the app description.
A fused machine learning model integrates the seq2seq model with a machine reading comprehension model.
arXiv Detail & Related papers (2020-07-31T01:28:02Z) - Beyond User Self-Reported Likert Scale Ratings: A Comparison Model for
Automatic Dialog Evaluation [69.03658685761538]
Open Domain dialog system evaluation is one of the most important challenges in dialog research.
We propose an automatic evaluation model CMADE that automatically cleans self-reported user ratings as it trains on them.
Our experiments show that CMADE achieves 89.2% accuracy in the dialog comparison task.
arXiv Detail & Related papers (2020-05-21T15:14:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.