T-BERT -- Model for Sentiment Analysis of Micro-blogs Integrating Topic
Model and BERT
- URL: http://arxiv.org/abs/2106.01097v1
- Date: Wed, 2 Jun 2021 12:01:47 GMT
- Title: T-BERT -- Model for Sentiment Analysis of Micro-blogs Integrating Topic
Model and BERT
- Authors: Sarojadevi Palani, Prabhu Rajagopal, Sidharth Pancholi
- Abstract summary: The effectiveness of BERT(Bidirectional Representations from Transformers) in sentiment classification tasks from a raw live dataset is demonstrated.
A novel T-BERT framework is proposed to show the enhanced performance obtainable by combining latent topics with contextual BERT embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment analysis (SA) has become an extensive research area in recent years
impacting diverse fields including ecommerce, consumer business, and politics,
driven by increasing adoption and usage of social media platforms. It is
challenging to extract topics and sentiments from unsupervised short texts
emerging in such contexts, as they may contain figurative words, strident data,
and co-existence of many possible meanings for a single word or phrase, all
contributing to obtaining incorrect topics. Most prior research is based on a
specific theme/rhetoric/focused-content on a clean dataset. In the work
reported here, the effectiveness of BERT(Bidirectional Encoder Representations
from Transformers) in sentiment classification tasks from a raw live dataset
taken from a popular microblogging platform is demonstrated. A novel T-BERT
framework is proposed to show the enhanced performance obtainable by combining
latent topics with contextual BERT embeddings. Numerical experiments were
conducted on an ensemble with about 42000 datasets using NimbleBox.ai platform
with a hardware configuration consisting of Nvidia Tesla K80(CUDA), 4 core CPU,
15GB RAM running on an isolated Google Cloud Platform instance. The empirical
results show that the model improves in performance while adding topics to BERT
and an accuracy rate of 90.81% on sentiment classification using BERT with the
proposed approach.
Related papers
- BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - Constructing Colloquial Dataset for Persian Sentiment Analysis of Social
Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way.
Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram.
Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Open-vocabulary Panoptic Segmentation with Embedding Modulation [71.15502078615587]
Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world.
Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results.
We propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panopticon.
arXiv Detail & Related papers (2023-03-20T17:58:48Z) - Transferring BERT-like Transformers' Knowledge for Authorship
Verification [8.443350618722562]
We study the effectiveness of several BERT-like transformers for the task of authorship verification.
We provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors.
We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.
arXiv Detail & Related papers (2021-12-09T18:57:29Z) - Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to
Corpus Exploration [25.159601117722936]
We propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings.
Our approach relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model.
As a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model.
arXiv Detail & Related papers (2021-09-13T20:31:57Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - BET: A Backtranslation Approach for Easy Data Augmentation in
Transformer-based Paraphrase Identification Context [0.0]
We call this approach BET by which we analyze the backtranslation data augmentation on the transformer-based architectures.
Our findings suggest that BET improves the paraphrase identification performance on the Microsoft Research Paraphrase Corpus to more than 3% on both accuracy and F1 score.
arXiv Detail & Related papers (2020-09-25T22:06:06Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - BERT-based Ensembles for Modeling Disclosure and Support in
Conversational Social Media Text [9.475039534437332]
We introduce a predictive ensemble model exploiting the finetuned contextualized word embeddings, RoBERTa and ALBERT.
We show that our model outperforms the base models in all considered metrics, achieving an improvement of $3%$ in the F1 score.
arXiv Detail & Related papers (2020-06-01T19:52:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.