Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using
Paragraph Vector
- URL: http://arxiv.org/abs/2009.05720v1
- Date: Sat, 12 Sep 2020 03:43:30 GMT
- Title: Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using
Paragraph Vector
- Authors: Ayu Purwarianti (1), Ida Ayu Putu Ari Crisdayanti (1) ((1) Institut
Teknologi Bandung)
- Abstract summary: Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising performance in sentiment classification task.
We propose the using of an existing document representation method called paragraph vector as additional input features for Bi-LSTM.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bidirectional Long Short-Term Memory Network (Bi-LSTM) has shown promising
performance in sentiment classification task. It processes inputs as sequence
of information. Due to this behavior, sentiment predictions by Bi-LSTM were
influenced by words sequence and the first or last phrases of the texts tend to
have stronger features than other phrases. Meanwhile, in the problem scope of
Indonesian sentiment analysis, phrases that express the sentiment of a document
might not appear in the first or last part of the document that can lead to
incorrect sentiment classification. To this end, we propose the using of an
existing document representation method called paragraph vector as additional
input features for Bi-LSTM. This vector provides information context of the
document for each sequence processing. The paragraph vector is simply
concatenated to each word vector of the document. This representation also
helps to differentiate ambiguous Indonesian words. Bi-LSTM and paragraph vector
were previously used as separate methods. Combining the two methods has shown a
significant performance improvement of Indonesian sentiment analysis model.
Several case studies on testing data showed that the proposed method can handle
the sentiment phrases position problem encountered by Bi-LSTM.
Related papers
- Out of Length Text Recognition with Sub-String Matching [54.63761108308825]
In this paper, we term this task Out of Length (OOL) text recognition.
We propose a novel method called OOL Text Recognition with sub-String Matching (SMTR)
SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features.
arXiv Detail & Related papers (2024-07-17T05:02:17Z) - TocBERT: Medical Document Structure Extraction Using Bidirectional Transformers [1.2343981093497332]
TocBERT represents a supervised solution trained on the detection of titles and sub-titles from semantic representations.
The solution has been applied on a medical text segmentation use-case where the Bio-ClinicalBERT model is fine-tuned to segment discharge summaries of the MIMIC-III dataset.
It achieved an F1-score of 84.6% when evaluated on a linear text segmentation problem and 72.8% on a hierarchical text segmentation problem.
arXiv Detail & Related papers (2024-06-27T20:56:57Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - Integrating Bidirectional Long Short-Term Memory with Subword Embedding
for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution.
The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z) - Supplementary Features of BiLSTM for Enhanced Sequence Labeling [1.6255202259274413]
The capacity of BiLSTM to produce sentence representations for sequence labeling tasks is inherently limited.
We devised a global context mechanism to integrate entire future and past sentence representations into each cell's sentence representation.
We noted significant improvements in F1 scores and accuracy across all examined datasets.
arXiv Detail & Related papers (2023-05-31T15:05:25Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Example-Based Machine Translation from Text to a Hierarchical
Representation of Sign Language [1.3999481573773074]
This article presents an original method for Text-to-Sign Translation.
It compensates data scarcity using a domain-specific parallel corpus of alignments between text and hierarchical formal descriptions of Sign Language videos in AZee.
Based on the detection of similarities present in the source text, the proposed algorithm exploits matches and substitutions of aligned segments to build multiple candidate translations.
The resulting translations are in the form of AZee expressions, designed to be used as input to avatar systems.
arXiv Detail & Related papers (2022-05-06T15:48:43Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Generalized Funnelling: Ensemble Learning and Heterogeneous Document
Embeddings for Cross-Lingual Text Classification [78.83284164605473]
emphFunnelling (Fun) is a recently proposed method for cross-lingual text classification.
We describe emphGeneralized Funnelling (gFun) as a generalization of Fun.
We show that gFun substantially improves over Fun and over state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-17T23:33:04Z) - Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix
Capture [2.7528170226206443]
We propose two novel ways to feature extraction, one to reduce the overlap ambiguity and the other to increase the ability to predict unknown words containing suffixes.
Our proposed method obtained a better F1-score than the prior state-of-the-art methods UETsegmenter, and RDRsegmenter.
arXiv Detail & Related papers (2020-06-14T05:19:46Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.