Fugu-MT 論文翻訳(概要): Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

論文の概要: Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

arxiv url: http://arxiv.org/abs/2510.03993v1
Date: Sun, 05 Oct 2025 01:52:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.377158
Title: Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning
Title（参考訳）: リーシュに維持する: 現実的な長手半教師付き学習に向けた制御可能な擬似ラベル生成
Authors: Yaxin Hou, Bo Han, Yuheng Jia, Hui Liu, Junhui Hou,
Abstract要約: ラベル付きデータセットから信頼できる擬似ラベルでラベル付きデータセットを拡張するための制御可能な擬似ラベル生成(CPG)フレームワークを提案する。 CPGは制御可能な自己強化最適化サイクルを介して動作する。 CPGは一貫した改善を達成し、最先端の手法を最大で textbf15.97% の精度で上回っている。
参考スコア（独自算出の注目度）: 88.48555005545694
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current long-tailed semi-supervised learning methods assume that labeled data exhibit a long-tailed distribution, and unlabeled data adhere to a typical predefined distribution (i.e., long-tailed, uniform, or inverse long-tailed). However, the distribution of the unlabeled data is generally unknown and may follow an arbitrary distribution. To tackle this challenge, we propose a Controllable Pseudo-label Generation (CPG) framework, expanding the labeled dataset with the progressively identified reliable pseudo-labels from the unlabeled dataset and training the model on the updated labeled dataset with a known distribution, making it unaffected by the unlabeled data distribution. Specifically, CPG operates through a controllable self-reinforcing optimization cycle: (i) at each training step, our dynamic controllable filtering mechanism selectively incorporates reliable pseudo-labels from the unlabeled dataset into the labeled dataset, ensuring that the updated labeled dataset follows a known distribution; (ii) we then construct a Bayes-optimal classifier using logit adjustment based on the updated labeled data distribution; (iii) this improved classifier subsequently helps identify more reliable pseudo-labels in the next training step. We further theoretically prove that this optimization cycle can significantly reduce the generalization error under some conditions. Additionally, we propose a class-aware adaptive augmentation module to further improve the representation of minority classes, and an auxiliary branch to maximize data utilization by leveraging all labeled and unlabeled samples. Comprehensive evaluations on various commonly used benchmark datasets show that CPG achieves consistent improvements, surpassing state-of-the-art methods by up to \textbf{15.97\%} in accuracy. The code is available at https://github.com/yaxinhou/CPG.
Abstract（参考訳）: 現在の長い尾を持つ半教師付き学習法では、ラベル付きデータは長い尾を持つ分布を示し、ラベル付きデータは典型的な事前定義された分布(例えば、長い尾を持つ、一様、逆の長い尾を持つ)に従うと仮定している。しかし、ラベルなしデータの分布は一般に不明であり、任意の分布に従う可能性がある。この課題に対処するために、ラベル付きデータセットをラベル付きデータセットから段階的に識別された信頼できる疑似ラベルで拡張し、更新されたラベル付きデータセット上のモデルを既知の分布でトレーニングする制御可能な擬似ラベル生成(CPG)フレームワークを提案する。具体的には、CPGは制御可能な自己強化最適化サイクルを通している。 i) トレーニングの各ステップにおいて、動的制御可能なフィルタリング機構は、ラベル付きデータセットから信頼できる疑似ラベルをラベル付きデータセットに選択的に組み込んで、更新されたラベル付きデータセットが既知の分布に従うことを保証します。 (ii)更新されたラベル付きデータ分布に基づいてロジット調整を用いたベイズ最適分類器を構築する。三この改良された分類器は、次の訓練段階において、より信頼性の高い擬似ラベルを特定するのに役立ちます。さらに、この最適化サイクルがいくつかの条件下での一般化誤差を著しく低減できることを理論的に証明する。さらに、少数クラスの表現をさらに改善するためのクラス対応適応拡張モジュールと、ラベル付きおよびラベルなしの全てのサンプルを活用することでデータ利用を最大化するための補助ブランチを提案する。様々な一般的なベンチマークデータセットに対する総合的な評価は、CPGが一貫した改善を達成し、最先端のメソッドを最大で textbf{15.97\%} の精度で上回っていることを示している。コードはhttps://github.com/yaxinhou/CPG.comで公開されている。

論文の概要: Keep It on a Leash: Controllable Pseudo-label Generation Towards Realistic Long-Tailed Semi-Supervised Learning

関連論文リスト