Lex2Sent: A bagging approach to unsupervised sentiment analysis
- URL: http://arxiv.org/abs/2209.13023v2
- Date: Tue, 22 Oct 2024 15:18:55 GMT
- Title: Lex2Sent: A bagging approach to unsupervised sentiment analysis
- Authors: Kai-Robin Lange, Jonas Rieger, Carsten Jentsch,
- Abstract summary: In this paper, we propose an alternative approach to classifying texts: Lex2Sent.
To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of a suitable lexicon.
We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.
- Score: 0.628122931748758
- License:
- Abstract: Unsupervised text classification, with its most common form being sentiment analysis, used to be performed by counting words in a text that were stored in a lexicon, which assigns each word to one class or as a neutral word. In recent years, these lexicon-based methods fell out of favor and were replaced by computationally demanding fine-tuning techniques for encoder-only models such as BERT and zero-shot classification using decoder-only models such as GPT-4. In this paper, we propose an alternative approach: Lex2Sent, which provides improvement over classic lexicon methods but does not require any GPU or external hardware. To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of the parts of a suitable lexicon. We employ resampling, which results in a bagging effect, boosting the performance of the classification. We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.
Related papers
- Lexical Substitution is not Synonym Substitution: On the Importance of Producing Contextually Relevant Word Substitutes [5.065947993017158]
We introduce ConCat, a simple augmented approach which utilizes the original sentence to bolster contextual information sent to the model.
Our study includes a quantitative evaluation, measured via sentence similarity and task performance.
We also conduct a qualitative human analysis to validate that users prefer the substitutions proposed by our method, as opposed to previous methods.
arXiv Detail & Related papers (2025-02-06T16:05:50Z) - Extract Free Dense Misalignment from CLIP [7.0247398611254175]
This work proposes a novel approach, dubbed CLIP4DM, for detecting dense misalignments from pre-trained CLIP.
We revamp the gradient-based attribution computation method, enabling negative gradient of individual text tokens to indicate misalignment.
Our method demonstrates state-of-the-art performance among zero-shot models and competitive performance with fine-tuned models.
arXiv Detail & Related papers (2024-12-24T12:51:05Z) - Token-Level Graphs for Short Text Classification [1.6819960041696331]
We propose an approach which constructs text graphs entirely based on tokens obtained through pre-trained language models (PLMs)
Our method captures contextual and semantic information, overcomes vocabulary constraints, and allows for context-dependent word meanings.
Experimental results demonstrate how our method consistently achieves higher scores or on-par performance with existing methods.
arXiv Detail & Related papers (2024-12-17T10:19:44Z) - Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework.
Label templates are embedded into input sentences to fully utilize the potential value of class labels.
supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - LeQua@CLEF2022: Learning to Quantify [76.22817970624875]
LeQua 2022 is a new lab for the evaluation of methods for learning to quantify'' in textual datasets.
The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting.
arXiv Detail & Related papers (2021-11-22T14:54:20Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Text Detoxification using Large Pre-trained Neural Models [57.72086777177844]
We present two novel unsupervised methods for eliminating toxicity in text.
First method combines guidance of the generation process with small style-conditional language models.
Second method uses BERT to replace toxic words with their non-offensive synonyms.
arXiv Detail & Related papers (2021-09-18T11:55:32Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.