Fugu-MT 論文翻訳(概要): TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification

論文の概要: TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification

arxiv url: http://arxiv.org/abs/2605.25474v1
Date: Mon, 25 May 2026 06:26:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:19.349789
Title: TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification
Title（参考訳）: TypedCSIP:中国の立法紛争分類のための型対実事前訓練
Authors: Yao Liu,
Abstract要約: TypedCSIPはLCR-CNベンチマークのコンフリクト分類タスクの型付き対実事前学習手法である。我々は、LCR-CNの専門家による最小限の修正を、訓練時対実監督として活用する。
参考スコア（独自算出の注目度）: 3.5137191090796054
License: http://creativecommons.org/licenses/by/4.0/
Abstract: TypedCSIP is a typed counterfactual pretraining method for the conflict-classification task of the LCR-CN benchmark (Zhao et al., 2026): given a (superior, subordinate) provision pair, predict whether the pair conflicts and which of four legal-doctrine types (Responsibility, Condition, Sanction, Definition) describes the inconsistency. We exploit LCR-CN's expert-written minimal revisions as training-time counterfactual supervision; at test time the classifier reads only the original pair. Stage 1 pretrains a shared encoder with a typed Counterfactual Selective Intervention Pretraining objective on (superior, subordinate, expert-revised) triplets, treating the expert revision as a counterfactual that the typed factor head must classify as carrying no conflict evidence. Stage 2 transfers the encoder to a five-way classification head. The confirmatory test was registered on the Open Science Framework before observing v6 measurements: 18 seeds, locked rule requiring mean per-seed difference at least 0.8 pp with both seed-bootstrap and Student-t 95% lower bounds above zero. On the 696-record test split, the v2 variant improves macro-F1 over the strongest single-model baseline by +0.916 pp on chinese-roberta-wwm-ext and +1.288 pp on the SAILER cross-backbone replication; both cells pass the rule. A cold-start stratified result on the 244 Unseen-gB records keeps the gain positive on both backbones. A cross-task diagnostic shows the Stage-2 encoder is classification-specialized and does not transfer to LCR-CN's superior-law retrieval task, so we scope the contribution to conflict classification. We release code, 72 pre-registered prediction files, matched-seed and MLM-control auxiliaries, and the OSF pre-registration record.
Abstract（参考訳）: TypedCSIPは、LCR-CNベンチマーク(Zhao et al , 2026)のコンフリクト分類タスクのための型付き偽装事前訓練手法である。我々は、LCR-CNのエキスパートによる最小限の修正を訓練時対実監督として利用し、テスト時に、分類器は元のペアのみを読み取る。ステージ1は、(上位、下位、専門家が修正した)三つ子に対して、タイプ済みの非現実的選択的介入事前訓練目標を持つ共有エンコーダを事前訓練し、タイプ済みのファクターヘッドが競合証拠を持たないものとして分類しなければならないと、専門家のリビジョンを反事実として扱う。ステージ2はエンコーダを5方向の分類ヘッドに転送する。確認試験は、v6測定前のOpen Science Frameworkに登録された:18の種子、平均1種当たりの差を少なくとも0.8pp以上要求するロックルール。 696レコードの試験分割では、v2の変種は、中国語のroberta-wwm-extで+0.916pp、SAILERのクロスバックボーン複製で+1.288ppでマクロF1を最強の単一モデルベースラインで改善した。 244 Unseen-gB レコードの冷間開始による成層化の結果は、両方の背骨に正の利得を保っている。クロスタスク診断では、Stage-2エンコーダは分類特化されており、LCR-CNの優良法則検索タスクに転送されないため、コンフリクト分類への貢献を除外する。我々は、コード、72の事前登録された予測ファイル、マッチシードおよびMLM制御補助装置、OSF事前登録記録をリリースする。

論文の概要: TypedCSIP: Typed Counterfactual Pretraining for Chinese Legislative Conflict Classification

関連論文リスト