Abstract: We propose a Concentrated Document Topic Model(CDTM) for unsupervised text
classification, which is able to produce a concentrated and sparse document
topic distribution. In particular, an exponential entropy penalty is imposed on
the document topic distribution. Documents that have diverse topic
distributions are penalized more, while those having concentrated topics are
penalized less. We apply the model to the benchmark NIPS dataset and observe
more coherent topics and more concentrated and sparse document-topic
distributions than Latent Dirichlet Allocation(LDA).