A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling

Mehdi Allahyari, Georgia Southern University
Seyedamin Pouriyeh, University of Georgia
Krys Kochut
Hamid Reza Arabnia, University of Georgia

Abstract

Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to address the problem. The ultimate goal of topic labeling techniques are to assign interpretable labels for the learned topics. In this paper, we are taking concepts of ontology into consideration instead of words alone to improve the quality of generated labels for each topic. Our work is different in comparison with the previous efforts in this area, where topics are usually represented with a batch of selected words from topics. We have highlighted some aspects of our approach including: 1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words; and 2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. The best topic labels are selected with respect to the semantic similarity of the concepts and their ontological categorizations. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics.