Understanding Generative Modelling(LDA) and Bayesian Networks

A Bayesian Network

It was almost 16–17 months back when i first read the topic modelling and the algorithm behind it called “Latent Dirichlet Allocation”.It was like i was reading Chinese and the Bayesian Networks did not make any sense to me.Today, i am writing this article explaining the Latent Dirichlet Allocation.So, you can say i understand Chinese now.

Firstly, What is a Generative Model? A machine learning model which generate an output considering the prior distribution of some objects.For e.g. let’s say we are building a model which generates the topic for a given text paragraph.Therefore,this generative model takes into consideration of the words in the paragraph and generates a posterior distribution of words (selective) to come up with a topic for that paragraph.

Let’s take up an example to understand topic modelling:

So, here we have a book , The Adventures of Sherlock Holmes which has 60% of detective,30% adventure and 10% horror.Therefore,Document/book is a distribution over topics.Now there are certain N words which falls under the topic “Detective” and similarly for other 2 words.Hence, Topics are distribution over words.

Here we can see , Topic Sports has 0.2 probable word as Football , 0.05 probability for Goal,0.1 for Hockey and 0.01 for Score. Similary for Economy and Politics.

Therefore,depending upon the who probable certain words are a topic is generated by the model.

Now question is what is Latent Dirichlet Allocation? There are 2 parts to it.Firstly,Latent — The model uses latent variable to create a Bayesian network and define the parameter to keep it simple, you can consider Latent variable as the variables which we don’t know about but have impact on the probability distribution.Just like , a feature which in the data-set which is not present but there is some feature which causing some effect on the output.(inferred features)

Second part is Dirichlet Allocation,It is type of probability density function which is used in topic modelling.Here in the formula , theta is a vector of parameter values. All components should add up to 1.B(alpha) is a positive constant and the density function can be visualized using a triangle where theta has 3 components only.

Since,we use Latent variables and Dirichlet Distribution to allocate the probabilities, the topic modelling is called Latent Dirichlet Allocation.


Let’s define the data and notations.

D — Total number of documents.

Theta-d — is the topic distribution for dth document out of Documents.

T — Total number of topics.

Zd1 — word probability distribution over 1st topic.

Wdn — probability of a word in nth topic distribution

V — the vocabulary of the words(total number of words in the corpus)

What you see above is called plate notation which is used to create visual representation of the Bayesian Network.It is pretty simple to read given the theta(documents), given the topics(z), what is the probability of word w over N such topics.

The formula is the joint probability of Words, topics and Documents which is expressed as the conditional probability of the R.H.S.Therefore, given the Probability of a particular document d and then given the probability of a topic given the dth document and then the probability of a particular word given the zth topic. Big pie represents product over documents D and inner pie product over all topics and we consider each document and topic as independent event.Since, the probabilities of independent events are product of individual event.

This is what the above formula is all about.

Generative — Now we can topics for a particular document as well as we can find generate text for a particular topic using the word probability distribution.

Bayesian Networks are all about prior and posterior probabilities and finding the optimal probability distribution function so that the likelihood and prior are a conjugate. Likelihood and prior needs to be conjugate for finding the correct posterior probability.The optimization algorithm used for Bayesian networks is called E-M algorithm(Expected-Maximization).

Will come up with Next article on E-M algorithm , which is the heart of Latent variable probabilistic modelling .Stay tuned!! :)

Homo Bayesian