Fugu-MT 論文翻訳(概要): A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts

論文の概要: A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts

arxiv url: http://arxiv.org/abs/2509.14922v1
Date: Thu, 18 Sep 2025 12:59:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.221926
Title: A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts
Title（参考訳）: ソーシャルメディアテキストにおけるペルシャ感情分析と感情検出のための大規模言語モデルの比較評価
Authors: Kian Tohidi, Kia Dashtipour, Simone Rebora, Sevda Pourfaramarz,
Abstract要約: 本研究では,ペルシャ語のソーシャルメディアテキストにおける感情分析と感情検出のための4つの大規模言語モデル(LLM)の比較評価を行った。その結果、全てのモデルの性能が許容できるレベルに達しており、最良の3つのモデルの統計的比較では、それらの間に有意な差は見られなかった。その結果、感情検出タスクは感情分析タスクと比較して全てのモデルにおいて困難であり、誤分類パターンはペルシア語のテキストにおけるいくつかの課題を表わす可能性があることが示唆された。
参考スコア（独自算出の注目度）: 2.820011731460364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study presents a comprehensive comparative evaluation of four state-of-the-art Large Language Models (LLMs)--Claude 3.7 Sonnet, DeepSeek-V3, Gemini 2.0 Flash, and GPT-4o--for sentiment analysis and emotion detection in Persian social media texts. Comparative analysis among LLMs has witnessed a significant rise in recent years, however, most of these analyses have been conducted on English language tasks, creating gaps in understanding cross-linguistic performance patterns. This research addresses these gaps through rigorous experimental design using balanced Persian datasets containing 900 texts for sentiment analysis (positive, negative, neutral) and 1,800 texts for emotion detection (anger, fear, happiness, hate, sadness, surprise). The main focus was to allow for a direct and fair comparison among different models, by using consistent prompts, uniform processing parameters, and by analyzing the performance metrics such as precision, recall, F1-scores, along with misclassification patterns. The results show that all models reach an acceptable level of performance, and a statistical comparison of the best three models indicates no significant differences among them. However, GPT-4o demonstrated a marginally higher raw accuracy value for both tasks, while Gemini 2.0 Flash proved to be the most cost-efficient. The findings indicate that the emotion detection task is more challenging for all models compared to the sentiment analysis task, and the misclassification patterns can represent some challenges in Persian language texts. These findings establish performance benchmarks for Persian NLP applications and offer practical guidance for model selection based on accuracy, efficiency, and cost considerations, while revealing cultural and linguistic challenges that require consideration in multilingual AI system deployment.
Abstract（参考訳）: 本研究では、ペルシアのソーシャルメディアテキストにおける感情分析と感情検出のための4つの最先端のLarge Language Model (LLMs)-Claude 3.7 Sonnet, DeepSeek-V3, Gemini 2.0 Flash, GPT-4o-の総合的な比較評価を行った。 LLMの比較分析は近年顕著な増加をみせたが、これらの分析のほとんどは英語のタスクで行われ、言語間パフォーマンスパターンの理解にギャップが生じた。この研究は、感情分析のための900テキスト(肯定的、否定的、中立的)と、感情検出のための1,800テキスト(恐怖、恐怖、幸福、憎しみ、悲しみ、驚き)を含むバランスのとれたペルシアのデータセットを使用して、厳密な実験設計を通じてこれらのギャップに対処する。主な焦点は、一貫したプロンプト、一貫した処理パラメータを使用し、精度、リコール、F1スコアなどのパフォーマンス指標を誤分類パターンとともに分析することで、異なるモデル間で直接的かつ公平な比較を可能にすることである。その結果、全てのモデルの性能が許容できるレベルに達しており、最良の3つのモデルの統計的比較では、それらの間に有意な差は見られなかった。しかし、GPT-4oは両方のタスクの精度が極端に高く、Gemini 2.0 Flashは最もコスト効率が良いことを示した。その結果、感情検出タスクは感情分析タスクと比較して全てのモデルにおいて困難であり、誤分類パターンはペルシア語のテキストにおけるいくつかの課題を表わす可能性があることが示唆された。これらの結果は,ペルシャのNLPアプリケーションのパフォーマンスベンチマークを確立し,精度,効率,コストを考慮したモデル選択の実践的ガイダンスを提供するとともに,多言語AIシステム展開において考慮すべき文化的・言語的課題を明らかにする。

論文の概要: A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts

関連論文リスト