Fugu-MT 論文翻訳(概要): Complement Submodular Information Measures for Balanced and Robust Data Selection

論文の概要: Complement Submodular Information Measures for Balanced and Robust Data Selection

arxiv url: http://arxiv.org/abs/2605.24779v1
Date: Sat, 23 May 2026 23:43:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.431833
Title: Complement Submodular Information Measures for Balanced and Robust Data Selection
Title（参考訳）: バランスとロバストなデータ選択のための補足サブモジュール情報対策
Authors: Rishabh Iyer,
Abstract要約: 補足サブモジュール情報(CSI)は補足サブモジュールの新たなクラスである。 CSIの目標は、ロバストな部分集合選択における標準部分モジュラー目標よりも一貫して優れていることを示す。特にCSIの目的は、コヒーレントなレア/テールの意味構造の保存を著しく改善することである。
参考スコア（独自算出の注目度）: 0.20305676256390934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Submodular optimization has become a fundamental paradigm for data selection, retrieval, summarization, and representation learning due to its ability to model coverage, diversity, and representativeness. However, classical submodular objectives optimize only the selected subset and do not explicitly preserve structural information between the selected subset and the remaining data. In many modern machine learning applications, including train/validation/test splitting, benchmark construction, and robust subset selection, the quality of a selection depends critically on preserving balanced structure across both the selected subset and its complement. In this work, we introduce Complement Submodular Information (CSI), a new class of complement-aware submodular objectives that quantify shared structural information between a subset and its complement. Our framework induces complement-aware variants of several classical submodular functions including Facility Location, Graph Cut, LogDet, Saturated Coverage, Set Cover, Probabilistic Set Cover, and Feature Based Functions. We analyze the theoretical properties of CSI objectives and show that they exhibit approximate monotonicity under bounded curvature conditions, leading to near-$(1-1/e)$ greedy approximation guarantees. Empirically, CSI objectives consistently outperform standard submodular objectives on robust hidden-slice-aware subset selection. In particular, CSI objectives significantly improve preservation of coherent rare/tail semantic structure while simultaneously suppressing noisy and isolated outliers, leading to substantially improved downstream predictive performance. Synthetic experiments further illustrate how different CSI instantiations capture complementary notions of representativeness, diversity, connectivity, and balanced neighborhood preservation.
Abstract（参考訳）: サブモジュール最適化は、データ選択、検索、要約、表現学習の基本的なパラダイムとなっている。しかし、古典的部分モジュラー目的は選択されたサブセットのみを最適化し、選択されたサブセットと残りのデータの間の構造情報を明示的に保持しない。トレイン/バリデーション/テスト分割、ベンチマーク構成、ロバストなサブセット選択など、現代の機械学習アプリケーションでは、選択の品質は選択されたサブセットと補完部分の両方でバランスの取れた構造を保存することに決定的に依存する。本研究では,部分集合と補集合の間の共有構造情報を定量化する補足型サブモジュールの新たなクラスであるComplement Submodular Information (CSI)を紹介する。本フレームワークは,施設位置,グラフカット,ログデット,飽和カバー,セットカバー,確率的セットカバー,特徴ベース関数など,いくつかの古典的サブモジュール関数の補完型を誘導する。我々は, CSI対象の理論的性質を解析し, 有界曲率条件下で近似単調性を示すことを示す。経験的に、CSIの目標は、頑健な隠れスライス対応サブセットの選択において、標準のサブモジュラー目標を一貫して上回る。特に,CSIの目的は,ノイズや孤立した異常を同時に抑制しつつ,コヒーレントなレア/テールの意味構造の保存を著しく改善し,下流予測性能を著しく向上させる。合成実験はさらに、異なるCSIインスタンスが、代表性、多様性、接続性、バランスの取れた近所の保存の相補的な概念をいかに捉えているかを示す。

論文の概要: Complement Submodular Information Measures for Balanced and Robust Data Selection

関連論文リスト