Some Methods for Posterior Inference in Topic Models
AbstractThe problem of posterior inference for individual documents is particularly important in topic models. However, it is often intractable in practice. Many existing methods for posterior inference such as variational Bayes, collapsed variational Bayes and collapsed Gibbs sampling do not have any guarantee on either quality or rate of convergence. The online maximum a posteriori estimation (OPE) algorithm has more attractive properties than other inference approaches. In this paper, we introduced four algorithms to improve OPE (namely, OPE1, OPE2, OPE3, and OPE4) by combining two stochastic bounds. Our new algorithms not only preserve the key advantages of OPE but also can sometimes perform significantly better than OPE. These algorithms were employed to develop new effective methods for learning topic models from massive/streaming text collections. Empirical results show that our approaches were often more efficient than the state-of-theart methods.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
D. M. Blei, “Probabilistic topic models,” Communications of the ACM, vol. 55, no. 4, pp. 77–84, 2012.
B. Liu, L. Liu, A. Tsykin, G. J. Goodall, J. E. Green, M. Zhu, C. H. Kim, and J. Li, “Identifying functional mirna–mrna regulatory modules with correspondence latent dirichlet allocation,” Bioinformatics, vol. 26, no. 24, pp. 3105–3111, 2010.
J. K. Pritchard, M. Stephens, and P. Donnelly, “Inference of population structure using multilocus genotype data,” Genetics, vol. 155, no. 2, p. 945, 2000.
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum, “Optimizing semantic coherence in topic models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011, pp. 262–272.
L. Yao, D. Mimno, and A. McCallum, “Efficient methods for topic model inference on streaming document collections,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009, pp. 937–946.
M. Hoffman, D. M. Blei, and D. M. Mimno, “Sparse stochastic inference for latent dirichlet allocation,” in Proceedings of the 29th International Conference on Machine Learning (ICML-12). New York, NY, USA: ACM, 2012, pp. 1599–1606.
J. Grimmer, “A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases,” Political Analysis, vol. 18, no. 1, pp. 1–35, 2010.
H. A. Schwartz, J. C. Eichstaedt, L. Dziurzynski, M. L. Kern, E. Blanco, M. Kosinski, D. Stillwell, M. E. Seligman, and L. H. Ungar, “Toward personality insights from language exploration in social media,” in AAAI Spring Symposium: Analyzing Microtext, 2013.
S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, and M. Zhu, “A practical algorithm for topic modeling with provable guarantees,” in Proceedings of the 30th International Conference on Machine Learning, vol. 28. PMLR, 2013, pp. 280–288.
D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference: A review for statisticians,” Journal of the American Statistical Association, to appear, 2016.
Y. W. Teh, K. Kurihara, and M. Welling, “Collapsed variational inference for hdp,” in Advances in neural information processing systems, 2007, pp. 1481–1488.
Y. W. Teh, D. Newman, and M. Welling, “A collapsed variational bayesian inference algorithm for latent dirichlet allocation,” in Advances in neural information processing systems, 2006, pp. 1353–1360.
A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, “On smoothing and inference for topic models,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009, pp. 27–34.
T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proceedings of the National academy of Sciences, vol. 101, no. suppl 1, pp. 5228–5235, 2004.
K. Than and T. Doan, “Guaranteed inference in topic models,” arXiv preprint arXiv:1512.03308, 2015.
J. Chen, J. He, Y. Shen, L. Xiao, X. He, J. Gao, X. Song, and L. Deng, “End-to-end learning of lda by mirror-descent back propagation over a deep architecture,” in Advances in Neural Information Processing Systems 28, 2015, pp. 1765–1773.
D. Sontag and D. Roy, “Complexity of inference in latent dirichlet allocation,” in Neural Information Processing System (NIPS), 2011.
A. L. Yuille, A. Rangarajan, and A. Yuille, “The concaveconvex procedure (cccp),” Advances in neural information processing systems, vol. 2, pp. 1033–1040, 2002.
J. Mairal, “Stochastic majorization-minimization algorithms for large-scale optimization,” in Neural Information Processing System (NIPS), 2013.
K. L. Clarkson, “Coresets, sparse greedy approximation, and the frank-wolfe algorithm,” ACM Trans. Algorithms, vol. 6, no. 4, pp. 63:1–63:30, 2010.
E. Hazan and S. Kale, “Projection-free online learning,” in Proceedings of the 29th International Conference on Machine Learning, ICML 2012, 2012.
S. Arora, R. Ge, F. Koehler, T. Ma, and A. Moitra, “Provable algorithms for inference in topic models,” in Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, 2016, pp. 2859–2867.
K. Than and T. Doan, “Dual online inference for latent Dirichlet allocation,” in Proceedings of the Sixth Asian Conference on Machine Learning, D. Phung and H. Li, Eds., vol. 39, 2015, pp. 80–95.
N. Aletras and M. Stevenson, “Evaluating topic coherence using distributional semantics,” in Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013). Association for Computational Linguistics, 2013, pp. 13–22.
S. J. Reddi, S. Sra, B. Poczos, and A. J.Smola, “Stochastic frank-wolfe methods for nonconvex optimization,” in Proceedings of 54th Annual Allerton Conference on Communication, Control, and Computing. IEEE, 2016, pp. 1244–1251.
W. Feller, “The general form of the so-called law of the iterated logarithm,” Transactions of the American Mathematical Society, vol. 54, no. 3, pp. 373–402, 1943.
J. H. Lau, D. Newman, and T. Baldwin, “Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality,” in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014, pp. 530–539.