Related papers: Stay on Topic, Please: Aligning User Comments to the Content of a News Article

Stay on Topic, Please: Aligning User Comments to the Content of a News Article

URL: http://arxiv.org/abs/2103.06130v1
Date: Wed, 3 Mar 2021 18:29:00 GMT
Title: Stay on Topic, Please: Aligning User Comments to the Content of a News Article
Authors: Jumanah Alshehri, Marija Stanojevic, Eduard Dragut, Zoran Obradovic
Abstract summary: We propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content. The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic. We conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task.
Score: 7.3203631241415055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social scientists have shown that up to 50% if the content posted to a news article have no relation to its journalistic content. In this study we propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content. The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic. We proposed a BERTAC, BAERT-based approach that learn jointly article-comment embeddings and infers the relevance class of comments. We introduce an ordinal classification loss that penalizes the difference between the predicted and true label. We conduct a thorough study to show influence of the proposed loss on the learning process. The results on five representative news outlets show that our approach can learn the comment class with up to 36% average accuracy improvement compering to the baselines, and up to 25% compering to the BA-BC model. BA-BC is out approach that consists of two models aimed to capture dis-jointly the formal language of news articles and the informal language of comments. We also conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task. The user agreement on comment-article alignment is "moderate" per Krippendorff's alpha score, which suggests that the classification task is difficult.

Related papers

LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback [16.57980268646285]
This paper studies how inappropriate language in arguments can be computationally mitigated. We propose a reinforcement learning-based rewriting approach that balances content preservation and appropriateness. We evaluate different weighting schemes for the reward function in both absolute and relative human assessment studies.
arXiv Detail & Related papers (2024-06-05T15:18:08Z)
Explore Spurious Correlations at the Concept Level in Language Models for Text Classification [28.832684088975622]
Language models (LMs) have achieved notable success in numerous NLP tasks. They face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations.
arXiv Detail & Related papers (2023-11-15T01:58:54Z)
JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z)
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning [8.530934084017966]
We introduce emphCrowdOpinion, an unsupervised learning approach that uses language features and label distributions to pool similar items into larger samples of label distributions. We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media. We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts.
arXiv Detail & Related papers (2023-07-07T22:09:46Z)
Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor. LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z)
Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification. We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages. Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z)
Hierarchical Bi-Directional Self-Attention Networks for Paper Review Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation. Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three) We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss [51.448615489097236]
Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. We propose a novel dual-view model that jointly improves the performance of these two tasks. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2020-06-02T13:34:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.