Providing Insights for Open-Response Surveys via End-to-End
Context-Aware Clustering
- URL: http://arxiv.org/abs/2203.01294v1
- Date: Wed, 2 Mar 2022 18:24:10 GMT
- Title: Providing Insights for Open-Response Surveys via End-to-End
Context-Aware Clustering
- Authors: Soheil Esmaeilzadeh, Brian Williams, Davood Shamsi, Onar Vikingstad
- Abstract summary: In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data.
Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors.
Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
- Score: 2.6094411360258185
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Teachers often conduct surveys in order to collect data from a predefined
group of students to gain insights into topics of interest. When analyzing
surveys with open-ended textual responses, it is extremely time-consuming,
labor-intensive, and difficult to manually process all the responses into an
insightful and comprehensive report. In the analysis step, traditionally, the
teacher has to read each of the responses and decide on how to group them in
order to extract insightful information. Even though it is possible to group
the responses only using certain keywords, such an approach would be limited
since it not only fails to account for embedded contexts but also cannot detect
polysemous words or phrases and semantics that are not expressible in single
words. In this work, we present a novel end-to-end context-aware framework that
extracts, aggregates, and abbreviates embedded semantic patterns in
open-response survey data. Our framework relies on a pre-trained natural
language model in order to encode the textual data into semantic vectors. The
encoded vectors then get clustered either into an optimally tuned number of
groups or into a set of groups with pre-specified titles. In the former case,
the clusters are then further analyzed to extract a representative set of
keywords or summary sentences that serve as the labels of the clusters. In our
framework, for the designated clusters, we finally provide context-aware
wordclouds that demonstrate the semantically prominent keywords within each
group. Honoring user privacy, we have successfully built the on-device
implementation of our framework suitable for real-time analysis on mobile
devices and have tested it on a synthetic dataset. Our framework reduces the
costs at-scale by automating the process of extracting the most insightful
information pieces from survey data.
Related papers
- Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Goal-Driven Explainable Clustering via Language Descriptions [50.980832345025334]
We propose a new task formulation, "Goal-Driven Clustering with Explanations" (GoalEx)
GoalEx represents both the goal and the explanations as free-form language descriptions.
Our method produces more accurate and goal-related explanations than prior methods.
arXiv Detail & Related papers (2023-05-23T07:05:50Z) - A combined approach to the analysis of speech conversations in a contact
center domain [2.575030923243061]
We describe an experimentation with a speech analytics process for an Italian contact center, that deals with call recordings extracted from inbound or outbound flows.
First, we illustrate in detail the development of an in-house speech-to-text solution, based on Kaldi framework.
Then, we evaluate and compare different approaches to the semantic tagging of call transcripts.
Finally, a decision tree inducer, called J48S, is applied to the problem of tagging.
arXiv Detail & Related papers (2022-03-12T10:03:20Z) - Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries.
We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model.
We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z) - Classification of Consumer Belief Statements From Social Media [0.0]
We study how complex expert annotations can be leveraged successfully for classification.
We find that automated class abstraction approaches perform remarkably well against domain expert baseline on text classification tasks.
arXiv Detail & Related papers (2021-06-29T15:25:33Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Word Embedding-based Text Processing for Comprehensive Summarization and
Distinct Information Extraction [1.552282932199974]
We propose two automated text processing frameworks specifically designed to analyze online reviews.
The first framework is to summarize the reviews dataset by extracting essential sentence.
The second framework is based on a question-answering neural network model trained to extract answers to multiple different questions.
arXiv Detail & Related papers (2020-04-21T02:43:31Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.