Double-Barreled Question Detection at Momentive
- URL: http://arxiv.org/abs/2203.03545v1
- Date: Sat, 12 Feb 2022 00:04:24 GMT
- Title: Double-Barreled Question Detection at Momentive
- Authors: Peng Jiang, Krishna Sumanth Muppalla, Qing Wei, Chidambara Natarajan
Gopal, Chun Wang
- Abstract summary: A double-barreled question (DBQ) is a common type of biased question that asks two aspects in one question.
Momentive aims to detect DBQs and recommend survey creators to make a change towards gathering high quality unbiased survey data.
We present an end-to-end machine learning approach for DBQ classification in this work.
- Score: 6.783610970053343
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Momentive offers solutions in market research, customer experience, and
enterprise feedback. The technology is gleaned from the billions of real
responses to questions asked on the platform. However, people may create biased
questions. A double-barreled question (DBQ) is a common type of biased question
that asks two aspects in one question. For example, "Do you agree with the
statement: The food is yummy, and the service is great.". This DBQ confuses
survey respondents because there are two parts in a question. DBQs impact both
the survey respondents and the survey owners. Momentive aims to detect DBQs and
recommend survey creators to make a change towards gathering high quality
unbiased survey data. Previous research work has suggested detecting DBQs by
checking the existence of grammatical conjunction. While this is a simple
rule-based approach, this method is error-prone because conjunctions can also
exist in properly constructed questions. We present an end-to-end machine
learning approach for DBQ classification in this work. We handled this
imbalanced data using active learning, and compared state-of-the-art embedding
algorithms to transform text data into vectors. Furthermore, we proposed a
model interpretation technique propagating the vector-level SHAP values to a
SHAP value for each word in the questions. We concluded that the word2vec
subword embedding with maximum pooling is the optimal word embedding
representation in terms of precision and running time in the offline
experiments using the survey data at Momentive. The A/B test and production
metrics indicate that this model brings a positive change to the business. To
the best of our knowledge, this is the first machine learning framework for DBQ
detection, and it successfully differentiates Momentive from the competitors.
We hope our work sheds light on machine learning approaches for bias question
detection.
Related papers
- PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.
The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.
We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z) - Enriching Social Science Research via Survey Item Linking [11.902701975866595]
We model a task called Survey Item Linking (SIL) in two stages: mention detection and entity disambiguation.
To this end, we create a high-quality and richly annotated dataset consisting of 20,454 English and German sentences.
We demonstrate that the task is feasible, but observe that errors propagate from the first stage, leading to a lower overall task performance.
arXiv Detail & Related papers (2024-12-20T12:14:33Z) - Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - QUADRo: Dataset and Models for QUestion-Answer Database Retrieval [97.84448420852854]
Given a database (DB) of question/answer (q/a) pairs, it is possible to answer a target question by scanning the DB for similar questions.
We build a large scale DB of 6.3M q/a pairs, using public questions, and design a new system based on neural IR and a q/a pair reranker.
We show that our DB-based approach is competitive with Web-based methods, i.e., a QA system built on top the BING search engine.
arXiv Detail & Related papers (2023-03-30T00:42:07Z) - Learning to Retrieve Engaging Follow-Up Queries [12.380514998172199]
We present a retrieval based system and associated dataset for predicting the next questions that the user might have.
Such a system can proactively assist users in knowledge exploration leading to a more engaging dialog.
arXiv Detail & Related papers (2023-02-21T20:26:23Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - A Wrong Answer or a Wrong Question? An Intricate Relationship between
Question Reformulation and Answer Selection in Conversational Question
Answering [15.355557454305776]
We show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon.
We present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets.
arXiv Detail & Related papers (2020-10-13T06:29:51Z) - Match$^2$: A Matching over Matching Model for Similar Question
Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers.
Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked.
It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions.
Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.