Wednesday, August 23, 2017

Bayesian Data Analysis : Conjugate Prior with R

In the last blog, I talked about non informative priors. As you might guess, there is one
other type of prior which is informative prior. Informative priors are based on expert
opinion, historical data and so on. After selection of the prior type, we find our distribution with some number of draws and then we get our posterior distribution using the specific model. If our analysis is sequential, in other words if we add some new data in progress, we can use posterior distribution as a prior distribution in next step. We can add some values to our distribution and keep going like that. In this blog I want to use this property in the example.

Conjugate priors are key concept in prior&posterior distributions. A simple definition for conjugate prior is that the family of prior distribution is the same with posterior distribution’ family. For example, if we choose our prior as Beta distribution we will end with Beta distribution in our posterior distribution. This makes our lives easier. 

Here is a simple example. Let’s say we have a coin that we don't know whether it is fair or not. So we are not sure that the probability of getting head is equal to 1/2. We start with Uniform(0, 1) which means the probability of getting head is anywhere between 0 and 1. Uniform(0, 1) is equal to Beta(1, 1). I am going to use Beta since it is conjugate prior. After choosing our prior distribution we toss the coin 6 times and the result is 2 heads and 4 tails. We have two successful outcomes. In the first place, we have Beta(1, 1). If α =1 and β = 1, posterior for Beta will be Beta(α + Σy, β + n - Σy) where Σy is the number of success. Σy is equal to 2 and n is equal to 6. So, our new Beta will be Beta(1 + 2, 1 + 6 - 2) = Beta(3, 5). This is the posterior distribution. We started with Beta and ended with Beta. 

I want to perform a simple Bayesian Data Analysis using the property explained in the example. Let’s say we have a company called Jacob’s Eggs that sells organic eggs. After discussing on how to sell it with marketing department, our CEO comes up with a plan of handing out brochures to houses in specific areas. After handing out to 25 houses we sold 9 subscriptions for our 3-month organic egg subscription. That is 36% sales rate. But our CEO, whose name is Jacob John Blacksea, is not sure about how to interpret those results. Is it successful sales rate? Furthermore, a good friend of Jacob John who is very good sales specialist named Bahadirius told him that the sales rate is often between 20 - 25%. Jacob got confused and wants to make a simple analysis to interpret the results. Here is the analysis,



>n_draws <- 100000
>prior <- runif(n_draws, 0,1)
>hist(prior)
 

>simulated_data <- rep(NA, n_draws)
>model <- function(parameter){
  return(rbinom(1, size = 25, p = parameter)) } 
>#rbinom takes probability from the prior and gives simulated # of success for each iteration 
>for( i in 1:n_draws){
    simulated_data[i] <- model(prior[i])
}
> posterior <- prior[simulated_data == 9]

> hist(posterior)

> abline(v=mean(posterior), col = "red")


What happened above is that we drew 100000 data and generated uniform distribution as prior. Our model function will give us the number of successes in the given size, which is 25 using prior probabilities. For each prior probability we return the number of success to simulated_data vector.  Our posterior is number of successes that are equal to 9 in simulated_data vector. 


> mean(posterior)

[1] 0.3692231

> 9/25
[1] 0.36
> quantile(posterior, c(0.025,0.975))
     2.5%     97.5% 
0.2049356 0.5511299 

95% of the probabilities are distributed between 0.2 and 0.55.

> #95% of the probabilities falls between 0.2 and 0.55
> sum(posterior > 0.3) / length(posterior)
[1] 0.7666064
> sum(posterior > 0.4) / length(posterior)
[1] 0.3528043
> sum(posterior > 0.25) / length(posterior)
[1] 0.9092789

We had 36% probability to sell subscriptions. The probability that our sales rate will be higher that 0.25 is 0.909. We are quite certain that it will be more than 0.25. But in the case of 0.4, we only have 0.35 as our probability. So we can expect that it will be less than 0.4. For now 36% seems like a good portion for sales rate.


In the above analysis we used the prior before Bahadirius shares his information. To add the new information I will select Σy = 4, n = 17. Remember that Uniform(0,1) is equal to Beta(1,1).

>#Beta(1 + 4, 1 + 17 - 4)
>priorB <- rbeta(n_draws, 5, 14) 
>hist(priorB)
This is more informative than uniform distribution. 

> simulated_data2 <- rep(NA,n_draws)
> for(i in 1:n_draws){
+     simulated_data2[i] <- model(priorB[i])
+ }
> posteriorB <- priorB[simulated_data2 == 9]
> hist(posteriorB)
> mean(posteriorB)
[1] 0.3188646

> hist(posteriorB)
> abline(v=mean(posteriorB), col = "red", lwd = 2)
 Here our mean for the probability of getting 9 sales is near 32%. We had 36% in the first place. To interpret it we can do something like

> sum(posteriorB > 0.35) / length(posteriorB)
[1] 0.3180726

The probability that sales rate will be higher than 35% is 0.32. So we can say that our sales people have done a good job by selling 9 subscriptions according to that analysis!

The example in this blog inspired from Rasmus Bååth's Research Blog. In this blog, there is a great series of lectures on Bayesian Data Analysis. If you are looking for an introductory lecture on Bayesian Statistics, Coursera has some effective courses like this one.




No comments:

Post a Comment

Gibbs Sampler

Gibbs Sampler In this blog we are going to discuss Gibbs Sampler with an applied example. But we will not dive into Gibbs Sampler direc...