App-Aware Response Synthesis for User Reviews
- URL: http://arxiv.org/abs/2007.15793v3
- Date: Tue, 10 Nov 2020 19:58:25 GMT
- Title: App-Aware Response Synthesis for User Reviews
- Authors: Umar Farooq, A.B. Siddique, Fuad Jamour, Zhijia Zhao, Vagelis
Hristidis
- Abstract summary: AAR Synth is an app-aware response synthesis system.
It retrieves the top-K most relevant app reviews and the most relevant snippet from the app description.
A fused machine learning model integrates the seq2seq model with a machine reading comprehension model.
- Score: 7.466973484411213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Responding to user reviews promptly and satisfactorily improves application
ratings, which is key to application popularity and success. The proliferation
of such reviews makes it virtually impossible for developers to keep up with
responding manually. To address this challenge, recent work has shown the
possibility of automatic response generation. However, because the training
review-response pairs are aggregated from many different apps, it remains
challenging for such models to generate app-specific responses, which, on the
other hand, are often desirable as apps have different features and concerns.
Solving the challenge by simply building a model per app (i.e., training with
review-response pairs of a single app) may be insufficient because individual
apps have limited review-response pairs, and such pairs typically lack the
relevant information needed to respond to a new review. To enable app-specific
response generation, this work proposes AARSynth: an app-aware response
synthesis system. The key idea behind AARSynth is to augment the seq2seq model
with information specific to a given app. Given a new user review, it first
retrieves the top-K most relevant app reviews and the most relevant snippet
from the app description. The retrieved information and the new user review are
then fed into a fused machine learning model that integrates the seq2seq model
with a machine reading comprehension model. The latter helps digest the
retrieved reviews and app description. Finally, the fused model generates a
response that is customized to the given app. We evaluated AARSynth using a
large corpus of reviews and responses from Google Play. The results show that
AARSynth outperforms the state-of-the-art system by 22.2% on BLEU-4 score.
Furthermore, our human study shows that AARSynth produces a statistically
significant improvement in response quality compared to the state-of-the-art
system.
Related papers
- Transformer-based Model for ASR N-Best Rescoring and Rewriting [4.906869033128613]
We propose a novel Transformer based model capable of rescoring and rewriting, by exploring full context of the N-best hypotheses in parallel.
We show that our Rescore+Rewrite model outperforms the Rescore-only baseline, and achieves up to an average 8.6% relative Word Error Rate (WER) reduction over the ASR system by itself.
arXiv Detail & Related papers (2024-06-12T13:39:44Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models [17.782410287625645]
This paper proposes a benchmark, RefuteBench, covering tasks such as question answering, machine translation, and email writing.
The evaluation aims to assess whether models can positively accept feedback in form of refuting instructions and whether they can consistently adhere to user demands throughout the conversation.
arXiv Detail & Related papers (2024-02-21T01:39:56Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Pre-Trained Neural Language Models for Automatic Mobile App User
Feedback Answer Generation [9.105367401167129]
Studies show that developers' answers to the mobile app users' feedbacks on app stores can increase the apps' star rating.
To help app developers generate answers that are related to the users' issues, recent studies develop models to generate the answers automatically.
In this paper, we evaluate pre-trained neural language models (PTMs) to generate replies to the mobile app user feedbacks.
arXiv Detail & Related papers (2022-02-04T18:26:55Z) - Emerging App Issue Identification via Online Joint Sentiment-Topic
Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT.
Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version.
Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Asking and Answering Questions to Evaluate the Factual Consistency of
Summaries [80.65186293015135]
We propose an automatic evaluation protocol called QAGS (pronounced "kags") to identify factual inconsistencies in a generated summary.
QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source.
We believe QAGS is a promising tool in automatically generating usable and factually consistent text.
arXiv Detail & Related papers (2020-04-08T20:01:09Z) - Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses.
Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.