Python, Gensim, LDA

by C. Ross Jam on September 22, 2010

Link parkin’: Gensim is a Python framework for vector space modeling. This means taking a corpus of text documents, where document means any bag of symbols from a fixed vocabulary, turning the documents into a vector representation and then discovering latent structure within the corpus. Good for unsupervised document analysis.

I’ve been wondering for a long time about the specific implementation of a vector space model algorithm known as Latent Dirichlet Allocation or LDA. The only LDA implementations I’ve seen previously are in C++ and Java and I can’t seem to grok how they translate the LDA math into code. Maybe the Python version in Gensim will be a bit more illuminating.

From → Uncategorized