Fugu-MT 論文翻訳(概要): ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

論文の概要: ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

arxiv url: http://arxiv.org/abs/2603.14843v1
Date: Mon, 16 Mar 2026 05:42:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:36.071141
Title: ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations
Title（参考訳）: ContiGuard: 進行性摂動に対する連続的毒性検出フレームワーク
Authors: Hankun Kang, Xin Miao, Jianhao Chen, Jintao Wen, Mayi Xu, Weiyu Zhang, Wenpeng Lu, Tieyun Qian,
Abstract要約: 悪意のあるユーザは、有害な内容の偽装や検知器の回避のために、絶えず回避的摂動を発達させる。従来の検出器や手法は時間とともに静的であり、これらの進化する回避戦術に対処するには不十分である。 ContiGuardは、時間進化テキスト上で検出器の連続的な学習に適した最初のフレームワークである。
参考スコア（独自算出の注目度）: 27.41321947366876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Toxicity detection mitigates the dissemination of toxic content (e.g., hateful comments, posts, and messages within online social actions) to safeguard a healthy online social environment. However, malicious users persistently develop evasive perturbations to disguise toxic content and evade detectors. Traditional detectors or methods are static over time and are inadequate in addressing these evolving evasion tactics. Thus, continual learning emerges as a logical approach to dynamically update detection ability against evolving perturbations. Nevertheless, disparities across perturbations hinder the detector's continual learning on perturbed text. More importantly, perturbation-induced noises distort semantics to degrade comprehension and also impair critical feature learning to render detection sensitive to perturbations. These amplify the challenge of continual learning against evolving perturbations. In this work, we present ContiGuard, the first framework tailored for continual learning of the detector on time-evolving perturbed text (termed continual toxicity detection) to enable the detector to continually update capability and maintain sustained resilience against evolving perturbations. Specifically, to boost the comprehension, we present an LLM-powered semantic enriching strategy, where we dynamically incorporate possible meaning and toxicity-related clues excavated by LLM into the perturbed text to improve the comprehension. To mitigate non-critical features and amplify critical ones, we propose a discriminability-driven feature learning strategy, where we strengthen discriminative features while suppressing the less-discriminative ones to shape a robust classification boundary for detection...
Abstract（参考訳）: 毒性の検出は、健康なオンライン社会環境を保護するために有害なコンテンツ(例えば、憎しみのあるコメント、投稿、メッセージ)の拡散を緩和する。しかし、悪意のあるユーザは、有害な内容の偽装や検知器の回避のために、絶えず回避的摂動を発達させる。従来の検出器や手法は時間とともに静的であり、これらの進化する回避戦術に対処するには不十分である。このように、継続的な学習は、進化する摂動に対する検出能力を動的に更新する論理的なアプローチとして現れる。それでも、摂動にまたがる格差は、摂動テキストに対する検出器の継続的な学習を妨げる。さらに重要なことは、摂動によって引き起こされるノイズは、意味論を歪め、理解を低下させ、また、摂動に敏感な検出をレンダリングするために重要な特徴学習を損なう。これらのことは、進化する摂動に対する継続的な学習の課題を増幅する。本研究では,時間進化する摂動テキスト(継続毒性検出)上で検出器の連続的な学習に適した最初のフレームワークであるContiGuardを紹介し,検出器の機能を継続的に更新し,進化する摂動に対する持続的なレジリエンスを維持できるようにする。具体的には, LLM を利用した意味豊か化戦略を提案し, LLM が抽出した有毒性関連手がかりを摂動テキストに動的に組み込んで理解を改善する。非クリティカルな特徴を緩和し、重要な特徴を増幅するために、識別可能性に基づく特徴学習戦略を提案する。

関連論文リスト

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
このサーベイは、受動的診断基準からリアルタイムモデル動作を導くアクティブ制御信号への不確実性の進化をグラフ化する。 3つのフロンティアにまたがるアクティブ制御信号として不確実性がいかに活用されているかを示す。この調査は、次世代のスケーラブルで信頼性があり、信頼できるAIを構築するためには、新しい不確実性のトレンドを習得することが不可欠である、と論じている。
論文参考訳（メタデータ） (2026-01-22T06:21:31Z)
Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy [95.93943805282868]
リプシッツ規則化物体検出(LROD) 本稿では,リプシッツ規則化YOLO(LR-YOLO)を提案する。ヘイズと低照度ベンチマークの実験では、LR-YOLOは検出安定性、最適化のスムーズさ、全体的な精度を一貫して改善している。
論文参考訳（メタデータ） (2025-10-28T09:41:42Z)
Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment [48.73836179661632]
信頼誘導対向アライメントは、識別的手がかりを消去することなく攻撃固有のアーティファクトを適応的に抑制する。 IB-CAANは、多くのベンチマークにおいて、ベースラインと最先端のパフォーマンスを一貫して上回る。
論文参考訳（メタデータ） (2025-09-28T03:48:49Z)
Diversity Boosts AI-Generated Text Detection [51.56484100374058]
DivEyeは、予備的な機能を使って、予測不可能がテキスト間でどのように変動するかをキャプチャする、新しいフレームワークである。提案手法は、既存のゼロショット検出器を最大33.2%向上させ、微調整ベースラインとの競合性能を達成する。
論文参考訳（メタデータ） (2025-09-23T10:21:22Z)
Towards Inclusive Toxic Content Moderation: Addressing Vulnerabilities to Adversarial Attacks in Toxicity Classifiers Tackling LLM-generated Content [12.26588825477595]
本研究では,様々なマイノリティグループにまたがる多様なデータセットを対象に,細調整のBERTとRoBERTa分類器について検討した。我々は、敵攻撃技術を用いて脆弱な回路を識別し、敵攻撃に対する性能を向上させる。モデルは、性能に欠かせないか、攻撃に弱いか、脆弱な頭部を抑えることで、対向入力の性能を向上させることが判明した。
論文参考訳（メタデータ） (2025-09-16T04:51:18Z)
Toxicity Detection towards Adaptability to Changing Perturbations [21.989281174371147]
本稿では,新しい問題,すなわち連続学習型ジェイルブレイク摂動パターンを毒性検出分野に導入する。まず,9種類の摂動パターンによって生成された新しいデータセットを構築し,その内7つは先行作業から要約し,そのうち2つは私たちによって開発された。次に、この新しい摂動パターン認識データセットにおける現在の手法の脆弱性を体系的に検証する。
論文参考訳（メタデータ） (2024-12-17T05:04:57Z)
ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations [6.360597788845826]
本研究では,現在最先端の大規模言語モデル (LLM) の体系的摂動データにおける攻撃的内容の同定における限界について検討する。我々の研究は、検出メカニズムを回避するために使用される進化的戦術に対抗するために、攻撃言語検出におけるより高度な技術が緊急に必要であることを強調している。
論文参考訳（メタデータ） (2024-06-18T02:44:56Z)
Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AIテキスト検出は、人間と機械が生成したコンテンツを区別するために現れた。近年の研究では、これらの検出システムは、しばしば頑丈さを欠き、摂動テキストを効果的に区別する難しさを欠いていることが示されている。我々の研究は、非公式な文章と専門的な文章の両方で現実世界のシナリオをシミュレートし、現在の検出器のアウト・オブ・ボックスのパフォーマンスを探求する。
論文参考訳（メタデータ） (2024-06-13T08:37:01Z)
Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition [133.35968094967626]
骨格に基づく行動認識は、動的状況への強い適応性から注目を集めている。ディープラーニング技術の助けを借りて、かなり進歩し、現在、良識のある環境で約90%の精度を達成している。異なる対角的環境下での骨格に基づく行動認識の脆弱性に関する研究はいまだ研究されていない。
論文参考訳（メタデータ） (2020-05-14T17:12:52Z)
Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift [62.997667081978825]
人工知能(AI)の最近の進歩は、サイバーセキュリティのための行動分析(UEBA)に新たな能力をもたらした。本稿では、検出プロセスを改善し、人間の専門知識を効果的に活用することにより、この攻撃を効果的に軽減するソリューションを提案する。
論文参考訳（メタデータ） (2020-01-13T13:54:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。