Abstract: The last decade has witnessed a rapid advance in machine learning models.
While the black-box nature of these systems allows powerful predictions, it
cannot be directly explained, posing a threat to the continuing democratization
of machine learning technology.
Tackling the challenge of model explainability, research has made significant
progress in demystifying the image classification models. In the same spirit of
these works, this paper studies code summarization models, particularly, given
an input program for which a model makes a prediction, our goal is to reveal
the key features that the model uses for predicting the label of the program.
We realize our approach in HouYi, which we use to evaluate four prominent code
summarization models: extreme summarizer, code2vec, code2seq, and sequence GNN.
Results show that all models base their predictions on syntactic and lexical
properties with little to none semantic implication. Based on this finding, we
present a novel approach to explaining the predictions of code summarization
models through the lens of training data.
Our work opens up this exciting, new direction of studying what models have
learned from source code.