Abstract: Normalization techniques are crucial in stabilizing and accelerating the
training of deep neural networks. However, they are mainly designed for the
independent and identically distributed (IID) data, not satisfying many
real-world out-of-distribution (OOD) situations. Unlike most previous works,
this paper presents two normalization methods, SelfNorm and CrossNorm, to
promote OOD generalization. SelfNorm uses attention to recalibrate statistics
(channel-wise mean and variance), while CrossNorm exchanges the statistics
between feature maps. SelfNorm and CrossNorm can complement each other in OOD
generalization, though exploring different directions in statistics usage.
Extensive experiments on different domains (vision and language), tasks
(classification and segmentation), and settings (supervised and
semi-supervised) show their effectiveness.