Latent Variable Discovery and Topic Modeling

Topic Modeling is a popular unsupervised latent variable learning problem. Assuming the word probability representation for each document, topic modeling is defined as estimating kdistributions over the words such that each document can be written as a linear form of these distributions. These k distributions are called the topic matrix. This problem is closely related to the Nonnegative Matrix Factorization (NMF). Recently, it has been shown that NMF can be efficiently solved under the so called Separability assumption on the topic matrix. We aim to provide a suite of computationally as well as statistically efficient algorithms to solve this problem.

  • W. Ding∗, M. H. Rohban∗, P. Ishwar, V. Saligrama, “Efficient Distributed Topic Modeling with Provable Guarantees,” 17th International Conference on Artificial Intelligence and Statistics (AISTATS), 2014, (accepted, * equal contribution).
  • W. Ding, P. Ishwar, M. H. Rohban, V. Saligrama, “Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models,” NIPS Workshop on Topic Models, 2013, (arXiv:1310.7994 [cs.LG]).
  • W. Ding*, M. H. Rohban*, P. Ishwar, V. Saligrama, “Topic Discovery Through Data Dependent and Random Projections,” International Conference on Machine Learning (ICML), 2013 (oral presentation, * equal contribution) arXiv:1303.3664 [stat.ML].
  • W. Ding, M. H. Rohban, P. Ishwar, V. Saligrama, “A New Geometric Approach to Latent Topic Modeling and Discovery,” ICASSP 2013, arXiv:1301.0858[stat.ML].