Fugu-MT 論文翻訳(概要): An Evaluation of Chat Safety Moderations in Roblox

論文の概要: An Evaluation of Chat Safety Moderations in Roblox

arxiv url: http://arxiv.org/abs/2605.04491v2
Date: Thu, 07 May 2026 18:52:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 16:31:22.836536
Title: An Evaluation of Chat Safety Moderations in Roblox
Title（参考訳）: ロブロックスにおけるチャットの安全性評価
Authors: Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman,
Abstract要約: 複数の年齢層で4つのゲームから約200万のチャットメッセージを収集しました。我々の発見は、未成年者の手入れ、セクシュアライゼーションに関連する、安全でないチャットメッセージの多数の事例という、厄介な現実を明らかにした。
参考スコア（独自算出の注目度）: 6.121106657637349
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service. We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, & harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.
Abstract（参考訳）: Robloxは最も人気のあるオンラインゲームプラットフォームの一つで、毎日数億人のユーザーが使っている。これらのユーザの大部分は未成年者であり、暴力的なユーザはRobloxのリアルタイムチャットインターフェースを使って、潜在的な犠牲者と最初の接触をすることができる。 Robloxは、潜在的に虐待的なメッセージを検出するために自動化されたチャットモデレーション機構を採用しているが、これまでは、その効果は独立して研究されていない。この目標に向けて,複数の年齢層にまたがる4つのゲームから約200万件のチャットメッセージを収集し,モデレーションシステムの評価を行った。これらのメッセージは、倫理的および法的規範とロブロックスのサービス規約に従って、公開ゲームサーバーから収集された。我々はこのコーパスを用いて、どの種類の安全でないチャットがモデレーションシステムから逃れるか、そしてポリシー違反ユーザーがモデレーションシステムからどのように逃れるかを質的に研究する。データセットのスケールを考えると、定性的なコンテンツ分析を手作業で行うのは極めて高価である。したがって、我々は2段階のアプローチを採用する。まず、安全で安全でないメッセージ(n=99.8K)を手動でラベル付けし、4つのローカルにホストされた最先端の大規模言語モデル(LLM)を評価するための基礎的真実として使用しました。次に, 最適性能のLSMを全コーパスに適用し, セマンティック飽和に到達するまで, 反復的開軸符号化法を用いて手作業で分類した。全体としては、未成年者、いじめ、ハラスメント、暴力、自傷行為、機密情報の共有など、多くの安全でないチャットメッセージが、現在のモデレーションから逃れた。メッセージが事前にフラグ付けされていたユーザを分析した結果,モデレーションシステムを回避するため,幅広い手法を用いて有害メッセージを送信し続けていることが明らかとなった。

論文の概要: An Evaluation of Chat Safety Moderations in Roblox

関連論文リスト