Abstract: Visual storytelling is a task of generating relevant and interesting stories
for given image sequences. In this work we aim at increasing the diversity of
the generated stories while preserving the informative content from the images.
We propose to foster the diversity and informativeness of a generated story by
using a concept selection module that suggests a set of concept candidates.
Then, we utilize a large scale pre-trained model to convert concepts and images
into full stories. To enrich the candidate concepts, a commonsense knowledge
graph is created for each image sequence from which the concept candidates are
proposed. To obtain appropriate concepts from the graph, we propose two novel
modules that consider the correlation among candidate concepts and the
image-concept correlation. Extensive automatic and human evaluation results
demonstrate that our model can produce reasonable concepts. This enables our
model to outperform the previous models by a large margin on the diversity and
informativeness of the story, while retaining the relevance of the story to the