Multilingual Sentiment Analysis of Summarized Texts: A Cross-Language Study of Text Shortening Effects
- URL: http://arxiv.org/abs/2504.00265v1
- Date: Mon, 31 Mar 2025 22:16:04 GMT
- Title: Multilingual Sentiment Analysis of Summarized Texts: A Cross-Language Study of Text Shortening Effects
- Authors: Mikhail Krasitskii, Grigori Sidorov, Olga Kolesnikova, Liliana Chanona Hernandez, Alexander Gelbukh,
- Abstract summary: Summarization significantly impacts sentiment analysis across languages with diverse morphologies.<n>This study examines extractive and abstractive summarization effects on sentiment classification in English, German, French, Spanish, Italian, Finnish, Hungarian, and Arabic.
- Score: 42.90274643419224
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Summarization significantly impacts sentiment analysis across languages with diverse morphologies. This study examines extractive and abstractive summarization effects on sentiment classification in English, German, French, Spanish, Italian, Finnish, Hungarian, and Arabic. We assess sentiment shifts post-summarization using multilingual transformers (mBERT, XLM-RoBERTa, T5, and BART) and language-specific models (FinBERT, AraBERT). Results show extractive summarization better preserves sentiment, especially in morphologically complex languages, while abstractive summarization improves readability but introduces sentiment distortion, affecting sentiment accuracy. Languages with rich inflectional morphology, such as Finnish, Hungarian, and Arabic, experience greater accuracy drops than English or German. Findings emphasize the need for language-specific adaptations in sentiment analysis and propose a hybrid summarization approach balancing readability and sentiment preservation. These insights benefit multilingual sentiment applications, including social media monitoring, market analysis, and cross-lingual opinion mining.
Related papers
- BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER -- a collection of multi-labeled datasets in 28 different languages.<n>We describe the data collection and annotation processes and the challenges of building these datasets.<n>We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - Comparative Approaches to Sentiment Analysis Using Datasets in Major European and Arabic Languages [42.90274643419224]
This study explores transformer-based models such as BERT, mBERT, and XLM-R for multi-lingual sentiment analysis.<n>Key contributions include the identification of XLM-R superior adaptability in morphologically complex languages, achieving accuracy levels above 88%.
arXiv Detail & Related papers (2025-01-21T23:11:16Z) - Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English [0.0]
This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation.
By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages.
arXiv Detail & Related papers (2024-05-05T10:52:09Z) - Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models [0.0]
This paper examines the impact of tokenization strategies and vocabulary sizes on the performance of Arabic language models.
Our study uncovers limited impacts of vocabulary size on model performance while keeping the model size unchanged.
Paper's recommendations include refining tokenization strategies to address dialect challenges, enhancing model robustness across diverse linguistic contexts, and expanding datasets to encompass the rich dialect based Arabic.
arXiv Detail & Related papers (2024-03-17T07:44:44Z) - Ensemble Language Models for Multilingual Sentiment Analysis [0.0]
We explore sentiment analysis on tweet texts from SemEval-17 and the Arabic Sentiment Tweet dataset.
Our findings include monolingual models exhibiting superior performance and ensemble models outperforming the baseline.
arXiv Detail & Related papers (2024-03-10T01:39:10Z) - USA: Universal Sentiment Analysis Model & Construction of Japanese
Sentiment Text Classification and Part of Speech Dataset [0.0]
This paper proposes enhancing performance by leveraging the Mutual Reinforcement Effect(MRE) between individual words and the overall text.
To support our research, we annotated four novel Sentiment Text Classification and Part of Speech(SCPOS) datasets.
Furthermore, we developed a Universal Sentiment Analysis(USA) model, with a 7-billion parameter size.
arXiv Detail & Related papers (2023-09-07T15:35:00Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Pragmatic information in translation: a corpus-based study of tense and
mood in English and German [70.3497683558609]
Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research.
We consider the correspondence between English and German tense and mood in translation.
Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
arXiv Detail & Related papers (2020-07-10T08:15:59Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.