Abstract: Recently, the (gradient-based) bilevel programming framework is widely used
in hyperparameter optimization and has achieved excellent performance
empirically. Previous theoretical work mainly focuses on its optimization
properties, while leaving the analysis on generalization largely open. This
paper attempts to address the issue by presenting an expectation bound w.r.t.
the validation set based on uniform stability. Our results can explain some
mysterious behaviours of the bilevel programming in practice, for instance,
overfitting to the validation set. We also present an expectation bound for the
classical cross-validation algorithm. Our results suggest that gradient-based
algorithms can be better than cross-validation under certain conditions in a
theoretical perspective. Furthermore, we prove that regularization terms in
both the outer and inner levels can relieve the overfitting problem in
gradient-based algorithms. In experiments on feature learning and data
reweighting for noisy labels, we corroborate our theoretical findings.