Fugu-MT 論文翻訳(概要): AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

論文の概要: AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

arxiv url: http://arxiv.org/abs/2605.23325v1
Date: Fri, 22 May 2026 07:39:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 17:29:20.245778
Title: AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse
Title（参考訳）: AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse (英語)
Authors: Esra'a Sharqawi, Wajdi Zaghouani,
Abstract要約: AraHopeCorpusは1万のYouTubeコメントから収集されたアラビア語希望音声の注釈付きデータセットである。このデータセットは、希望的な言語が支配的であり、全コメントの64%以上を占めていることを示している。約13%を表す希望のスピーチは、絶望と幻滅を反映しておらず、残りのコメントは中立あるいは混成コンテンツを含んでいる。
参考スコア（独自算出の注目度）: 0.6546712656847457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content. Inter-Annotator Agreement reached substantial levels (Cohen's Kappa equals 0.71), though dialectal variation, sarcasm, and implicit meaning posed annotation challenges. A comparative analysis between human annotators and ChatGPT revealed that large language models can support annotation but remain limited in handling dialectal and culturally embedded expressions. AraHopeCorpus will be released for research purposes under an open and non commercial license. It provides a valuable resource for studying constructive digital discourse, enabling further research on hope speech detection, crisis communication, and resilience in Arabic social media.
Abstract（参考訳）: ソーシャルメディアは、武装紛争の間、大衆の物語を形成するための重要な場となり、有害で建設的なコミュニケーションのためのスペースを提供してきた。ヘイトスピーチと誤報は広く研究されているが、弾力性、連帯性、楽観主義を促進する表現は、特にアラビアの文脈では未発見のままである。 AraHopeCorpusは、2023年から2024年にかけてのガザでの戦争に関するYouTubeコメント10万件から収集された、アラビア語の希望演説の注釈付きデータセットである。詳細なアノテーションの枠組みを用いて、コメントは希望演説、希望演説、中立的あるいは不明瞭な言説の3つのカテゴリに分類された。このデータセットは、希望的な言語が支配的であり、全コメントの64%以上を占めていることを示している。これらの希望の表現は、主に宗教的奨励、集合的な連帯、永続性と正義に対する楽観主義として現れる。約13%を表す希望のスピーチは、絶望と幻滅を反映しておらず、残りのコメントは中立あるいは混成コンテンツを含んでいる。アノテーション間協定は、方言のバリエーション、皮肉、暗黙的な意味がアノテーションの課題を提起するにもかかわらず、かなりの水準に達した(コーエンのカッパは0.71に等しい)。人間のアノテータとChatGPTの比較分析により、大きな言語モデルではアノテーションをサポートできるが、方言や文化的に埋め込まれた表現の扱いには制限があることが明らかとなった。 AraHopeCorpusは、オープンかつ非商用ライセンスの下で研究目的でリリースされる。建設的なデジタル談話の研究に貴重なリソースを提供し、アラビア語のソーシャルメディアにおける希望のスピーチの検出、危機コミュニケーション、レジリエンスについてさらなる研究を可能にする。

論文の概要: AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

関連論文リスト