Abstract: In this work, we present a novel mask guided attention (MGA) method for
fine-grained patchy image classification. The key challenge of fine-grained
patchy image classification lies in two folds, ultra-fine-grained
inter-category variances among objects and very few data available for
training. This motivates us to consider employing more useful supervision
signal to train a discriminative model within limited training samples.
Specifically, the proposed MGA integrates a pre-trained semantic segmentation
model that produces auxiliary supervision signal, i.e., patchy attention mask,
enabling a discriminative representation learning. The patchy attention mask
drives the classifier to filter out the insignificant parts of images (e.g.,
common features between different categories), which enhances the robustness of
MGA for the fine-grained patchy image classification. We verify the
effectiveness of our method on three publicly available patchy image datasets.
Experimental results demonstrate that our MGA method achieves superior
performance on three datasets compared with the state-of-the-art methods. In
addition, our ablation study shows that MGA improves the accuracy by 2.25% and
2% on the SoyCultivarVein and BtfPIS datasets, indicating its practicality
towards solving the fine-grained patchy image classification.