Fugu-MT 論文翻訳(概要): Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

論文の概要: Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

arxiv url: http://arxiv.org/abs/2603.18822v1
Date: Thu, 19 Mar 2026 12:20:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.136397
Title: Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework
Title（参考訳）: ノイズの多いロシアのソーシャルメディアテキストデータにおける基本値の検出:多段階分類フレームワーク
Authors: Maria Milkova, Maksim Rudnev,
Abstract要約: 本研究では, 雑音の多いロシア語ソーシャルメディアにおける人的価値を検出するための多段階分類フレームワークを提案する。我々はスパムと非個人コンテンツフィルタリング、価値と政治的に関連のある投稿のターゲット選択、LLMベースのアノテーション、マルチラベル分類を用いている。このモデルは一般に人間の判断と一致しているが、体系的に価値ドメインの変更に対するオープンネスを過大評価している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This study presents a multi-stage classification framework for detecting human values in noisy Russian language social media, validated on a random sample of 7.5 million public text posts. Drawing on Schwartz's theory of basic human values, we design a multi-stage pipeline that includes spam and nonpersonal content filtering, targeted selection of value relevant and politically relevant posts, LLM based annotation, and multi-label classification. Particular attention is given to verifying the quality of LLM annotations and model predictions against human experts. We treat human expert annotations not as ground truth but as an interpretative benchmark with its own uncertainty. To account for annotation subjectivity, we aggregate multiple LLM generated judgments into soft labels that reflect varying levels of agreement. These labels are then used to train transformer based models capable of predicting the probability of each of the ten basic values. The best performing model, XLM RoBERTa large, achieves an F1 macro of 0.83 and an F1 of 0.71 on held out test data. By treating value detection as a multi perspective interpretive task, where expert labels, GPT annotations, and model predictions represent coherent but not identical readings of the same texts, we show that the model generally aligns with human judgments but systematically overestimates the Openness to Change value domain. Empirically, the study reveals distinct patterns of value expression and their co-occurrence in Russian social networks, contributing to a broader research agenda on cultural variation, communicative framing, and value based interpretation in digital environments. All models are released publicly.
Abstract（参考訳）: 本研究では、75万件の公開テキスト投稿のランダムなサンプルを用いて、ノイズの多いロシア語ソーシャルメディアにおける人的価値を検出するための多段階分類フレームワークを提案する。シュワルツの基本的な人的価値の理論に基づいて、スパムや非個人的コンテンツフィルタリング、意味のある、政治的に関係のあるポストのターゲット選択、LLMベースのアノテーション、マルチラベル分類を含む多段階パイプラインを設計する。 LLMアノテーションの品質検証や人的専門家に対するモデル予測に特に注意が払われる。我々は、人間の専門家アノテーションを、基礎的な真実ではなく、独自の不確実性のある解釈的ベンチマークとして扱う。アノテーションの主観性を考慮し,複数のLCM生成した判断を,様々なレベルの一致を反映したソフトラベルに集約する。これらのラベルは、10つの基本値の確率を予測できるトランスフォーマーベースモデルのトレーニングに使用される。最高のパフォーマンスモデルであるXLM RoBERTa largeは、保持されたテストデータに対して、F1マクロが0.83、F1が0.71となる。専門家ラベル, GPTアノテーション, モデル予測が同一テキストのコヒーレントだが同一の読影を表現している多視点解釈タスクとして値検出を取り扱うことにより, モデルが一般に人間の判断に合致するが, 価値領域の変更に対するオープンネスを体系的に過大評価していることを示す。実証的に、この研究は、価値表現の異なるパターンと、その共起をロシアのソーシャルネットワークで明らかにし、デジタル環境における文化的変動、コミュニケーション的フレーミング、および価値に基づく解釈に関するより広範な研究課題に寄与した。全モデル公開。

論文の概要: Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

関連論文リスト