Notes on Text Mining and Analytics - 6

Text-based prediction

  • Latent Aspect Rating Analysis (LARA): The latent aspect weights are not necessarily equal; they are inferred using maximum likelihood. LARA is a generative model for inferring ratings of latent aspects. LARA is composed of two stages: aspect segmentation and latent rating regression.
  • NetPLSA has an additional term in its objective function that penalizes cases where neighbor nodes are assigned different topic coverage. NetPLSA leverages the power of both the text and the network structure to mine topics.
  • The objective function of NetPLSA, increasing λ will make neighbor nodes have more similar topic coverage.
  • Contextual Probabilistic Latent Semantic Analysis (CPLSA) can be applied to discovering temporal trends of topics in text and revealing how the coverage of topics in different locations evolves over time
  • To measure the causality between two series, Granger is often used.

Course summary


Topics Covered in This Course Key High-Level Take-Away Messages


[1] C. Zhai and S. Massung, Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM and Morgan & Claypool Publishers, 2016. Chapters 18 & 19. [2] Hongning Wang, Yue Lu, and ChengXiang Zhai, Latent aspect rating analysis on review text data: a rating regression approach. In Proceedings of ACM KDD 2010, pp. 783-792, 2010. doi: 10.1145/1835804.1835903 [3] Hongning Wang, Yue Lu, and ChengXiang Zhai. 2011. Latent aspect rating analysis without aspect keyword supervision. In Proceedings of ACM KDD 2011, pp. 618-626. doi: 10.1145/2020408.2020505 [4] ChengXiang Zhai, Atulya Velivelli, and Bei Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2004). ACM, New York, NY, USA, 743-748. doi: 10.1145/1014052.1014150 [5] Qiaozhu Mei, Contextual Text Mining, Ph.D. Thesis, University of Illinois at Urbana-Champaign, 2009. [6] Hyun Duk Kim, Malu Castellanos, Meichun Hsu, ChengXiang Zhai, Thomas Rietz, and Daniel Diermeier. Mining causal topics in text data: Iterative topic modeling with time series feedback. In Proceedings of the 22nd ACM international conference on information & knowledge management (CIKM 2013). ACM, New York, NY, USA, 885-890. doi: 10.1145/2505515.2505612 [7] Noah Smith, Text-Driven Forecasting. Retrieved on May 31, 2015 from