Not All Comments are Equal: Insights into Comment Moderation from a
Topic-Aware Model
- URL: http://arxiv.org/abs/2109.10033v1
- Date: Tue, 21 Sep 2021 08:57:17 GMT
- Title: Not All Comments are Equal: Insights into Comment Moderation from a
Topic-Aware Model
- Authors: Elaine Zosa, Ravi Shekhar, Mladen Karan, Matthew Purver
- Abstract summary: We make our models topic-aware, incorporating semantic features from a topic model into the classification decision.
Our results show that topic information improves the performance of the model, increases its confidence in correct outputs, and helps us understand the model's outputs.
- Score: 8.28576076054666
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Moderation of reader comments is a significant problem for online news
platforms. Here, we experiment with models for automatic moderation, using a
dataset of comments from a popular Croatian newspaper. Our analysis shows that
while comments that violate the moderation rules mostly share common linguistic
and thematic features, their content varies across the different sections of
the newspaper. We therefore make our models topic-aware, incorporating semantic
features from a topic model into the classification decision. Our results show
that topic information improves the performance of the model, increases its
confidence in correct outputs, and helps us understand the model's outputs.
Related papers
- VHELM: A Holistic Evaluation of Vision Language Models [75.88987277686914]
We present the Holistic Evaluation of Vision Language Models (VHELM)
VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety.
Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast.
arXiv Detail & Related papers (2024-10-09T17:46:34Z) - LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models [12.500091504010067]
We propose WALM (Words Agreement with Language Model), a new evaluation method for topic modeling.
With extensive experiments involving different types of topic models, WALM is shown to align with human judgment.
arXiv Detail & Related papers (2024-06-13T11:19:50Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Dynamic Review-based Recommenders [1.5427245397603195]
We leverage the known power of reviews to enhance rating predictions in a way that respects the causality of review generation.
Our representations are time-interval aware and thus yield a continuous-time representation of the dynamics.
arXiv Detail & Related papers (2021-10-27T20:17:47Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Unsupervised Graph-based Topic Modeling from Video Transcriptions [5.210353244951637]
We develop a topic extractor on video transcriptions using neural word embeddings and a graph-based clustering method.
Experimental results on the real-life multimodal data set MuSe-CaR demonstrate that our approach extracts coherent and meaningful topics.
arXiv Detail & Related papers (2021-05-04T12:48:17Z) - Generating Diversified Comments via Reader-Aware Topic Modeling and
Saliency Detection [25.16392119801612]
We propose a reader-aware topic modeling and saliency information detection framework to enhance the quality of generated comments.
For reader-aware topic modeling, we design a variational generative clustering algorithm for latent semantic learning and topic mining from reader comments.
For saliency information detection, we introduce Bernoulli distribution estimating on news content to select saliency information.
arXiv Detail & Related papers (2021-02-13T03:50:31Z) - A Disentangled Adversarial Neural Topic Model for Separating Opinions
from Plots in User Reviews [35.802290746473524]
We propose a neural topic model combined with adversarial training to disentangle opinion topics from plot and neutral ones.
We conduct an experimental assessment introducing a new collection of movie and book reviews paired with their plots.
Showing an improved coherence and variety of topics, a consistent disentanglement rate, and sentiment classification performance superior to other supervised topic models.
arXiv Detail & Related papers (2020-10-22T02:15:13Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.