Fugu-MT 論文翻訳(概要): Data Augmentation: A Fourier Analysis Perspective

論文の概要: Data Augmentation: A Fourier Analysis Perspective

arxiv url: http://arxiv.org/abs/2606.24418v1
Date: Tue, 23 Jun 2026 10:54:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.905502
Title: Data Augmentation: A Fourier Analysis Perspective
Title（参考訳）: データ拡張: フーリエ分析の視点
Authors: Behrooz Tahmasebi, Melanie Weber, Stefanie Jegelka,
Abstract要約: 部分的なデータ拡張は、一般化やサンプルの複雑さの観点から、完全な拡張と同じ統計的利点が得られることを示す。この結果から, 部分拡張が, ほぼ対称性を保っているにもかかわらず, 完全拡張の統計的利点を維持できる理由を理論的に説明できる。
参考スコア（独自算出の注目度）: 52.00584777952869
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data augmentation is a simple and model-agnostic approach for exploiting known invariances in learning problems. Given a group acting on the input space, one augments the training set with transformed copies of each sample. Because it exploits symmetries without modifying the underlying learning algorithm, data augmentation can be applied broadly across learning methods. However, this universality comes at a computational cost: when the group is large, full group-sized augmentation quickly becomes computationally infeasible. This raises a fundamental question: Can partial data augmentation achieve the same statistical benefits as full augmentation in terms of generalization and sample complexity? We develop a general framework for investigating this question using Fourier analysis and the representation theory of finite groups. We show that, for a broad class of classical learning problems, partial data augmentation based on a randomly sampled subset of group elements achieves the same minimax rates as full augmentation, up to an approximation error that vanishes as the subset size increases. Our results provide a theoretical explanation for why partial augmentation can retain the statistical benefits of full augmentation despite enforcing symmetry only approximately, and shed light on a recently raised question in learning with symmetries: whether statistically optimal learning under general group invariances can be achieved using computationally scalable methods. Moreover, we prove a complementary impossibility result: enforcing exact invariance via data augmentation requires averaging over the entire group, and cannot be achieved by any strict subset when the hypothesis space is sufficiently expressive. Together, these results provide a unified perspective on full and partial data augmentation, as well as exact and approximate symmetry enforcement.
Abstract（参考訳）: データ拡張は、学習問題における既知の不変性を利用するための、シンプルでモデルに依存しないアプローチである。入力空間に作用する群が与えられたら、各サンプルの変換されたコピーでトレーニングセットを増強する。基礎となる学習アルゴリズムを変更することなく対称性を利用するため、データ拡張は学習方法全体で広く適用することができる。しかし、この普遍性には計算コストが伴う: 群が大きければ、グループサイズの拡張はすぐに計算的に実現不可能になる。部分的なデータ拡張は、一般化とサンプルの複雑さの観点から、完全な拡張と同じ統計的利点を得られるか? 我々は、フーリエ解析と有限群の表現論を用いて、この問題を研究するための一般的な枠組みを開発する。古典的学習問題の幅広いクラスにおいて、グループ要素のランダムにサンプリングされたサブセットに基づく部分的データ拡張は、サブセットサイズが大きくなるにつれて消滅する近似誤差まで、完全拡張と同じミニマックスレートを達成することを示す。この結果から, 一般群不変条件下での統計的最適学習が, 計算にスケーラブルな手法で達成できるかどうかという, 対称性をほぼ強制するにもかかわらず, 部分拡張が完全拡張の統計的利点を維持できる理由を理論的に説明できる。さらに、データ拡張による正確な不変性を強制するには、グループ全体の平均化が必要であり、仮説空間が十分に表現可能であれば、任意の厳密な部分集合によって達成できない。これらの結果は、完全および部分的なデータ拡張に関する統一的な視点と、正確かつ近似対称性の強制を提供する。

論文の概要: Data Augmentation: A Fourier Analysis Perspective

関連論文リスト