Friday, March 23, 2018

Bayesian Regression with Normal Prior

Bayesian Simple Linear Regression


Hello, recently I finished the book, Introduction to Bayesian Statistics from William Bolstad. I liked the way Bolstad teaches Bayesian Statistics. There are some chapters that compare frequentist and bayesian approaches and their performances. I recommend it to everyone who want to learn about Bayesian Statistics.

Chapter 13 of the book is dedicated to Bayesian Simple Regression. This blog is nothing but an application of this chapter.

In simple regression, we have one predictor variable and one response variable. The simple linear regression line has two parameters which are slope and intercept. In Bayesian Statistics, those parameters can be handled like they are random which is different in frequentist view. In frequentist statistics, parameters are accepted as fixed and unknown parameters.
In this example, first we generate one response variable and one predictor variable. Distributions of those variables are shown in the figures. Those variables are generated using rnorm function. Distributions and histograms are below.
set.seed(35)
x = rnorm(10000,3,2)
y = rnorm(10000,10,5)
y = y + (3*rnorm(10000,4,1) - sqrt(3) + seq(from = 1, to = 100, by = ((100-1)/(10000-1))))
hist(x)
hist(y)
After drawing such distributions, sum of squares x, sum of squares y, sum of squares xy are calculated as follows.
SSy = sum((y - mean(y))^2)
SSy
## [1] 8520246
SSx = sum((x - mean(x))^2)
SSx
## [1] 40537.42
SSxy = sum((y - mean(y))*(x - mean(x)))
SSxy
## [1] -10540.16
After finding sum of square values, lets plot our data, x and y.
plot(x,y)
As can be seen from the plot, there is a high amoung of randomness in this data set. We cannot easily say that x is a good predictor value for y. This actually not a common way to work with such data since we try to learn about the parameter values, slope and intercept. Here we do not have such a relationship between x and y but the main point is updating our parameters and seeing that even if we use priors that we assigned not so wisely it converges to the original values.
Let’s quickly check how much effect does x has on y by doing linear regression in a frequentist way. R has a built in function called lm for that.
freq.lm = lm(y~x)
summary(freq.lm)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.293 -24.821   0.035  24.636  64.248 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71.5284     0.5283 135.398   <2e-16 ***
## x            -0.2600     0.1450  -1.794   0.0729 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.19 on 9998 degrees of freedom
## Multiple R-squared:  0.0003217,  Adjusted R-squared:  0.0002217 
## F-statistic: 3.217 on 1 and 9998 DF,  p-value: 0.07291
freq.lm$coefficients
## (Intercept)           x 
##  71.5283801  -0.2600106
Since the relationship between x and y is quite random, we decided to use 0.1 for the prior mean and 0.5 for the prior standard deviation. For the interception, we use 71 for the prior mean and 1 for the prior standard deviation.
Using updating rule for Bayesian Simple Linear Regression (you can find it in more detail in Bolstad's book), 
posterior.slope.var = 1 / ((1/0.5^2) + (SSx / 4))
posterior.slope.mean = (1 / (0.5)^2) / (1 / posterior.slope.var) * 0.1 + ((SSx/4) / (1 / posterior.slope.var)) * -0.26
posterior.slope.var
## [1] 9.863532e-05
posterior.slope.mean
## [1] -0.259858
The computed values, posterior.slope.mean and posterior.slope.var are posterior values for the slope parameter. -0.26 comes from SSxy / SSx, which is the least squares slope.
The next step is the computation of the posterior values for the interception.
posterior.int.var = 1 / ((1 / 1) + (10000/4))
posterior.int.mean = (1 / (1 / posterior.int.var)) * 71 + ((10000/4) / (1/posterior.int.var)) * mean(y)
posterior.int.var
## [1] 0.0003998401
posterior.int.mean
## [1] 70.73872
The computed values, posterior.int.mean and posterior.int.var are posterior values for the interception parameter. 71 comes from prior mean of the interception.

Let’s have a look at their distributions drawing 10000 samples from them.
Slope = rnorm(10000,posterior.slope.mean, sqrt(posterior.slope.var))
Intercept = rnorm(10000, posterior.int.mean, sqrt(posterior.int.var))
hist(Slope)
hist(Intercept)
Now, let’s create bivariate normal distribution of those two distribution. To do that, we need to create covariance matrix of those distributions.
library(MASS)
s = cov(Slope, Intercept)
Sigma = matrix(c(posterior.slope.var,s,s,posterior.int.var),2,2)
bivarite.dist = mvrnorm(n = 10000, mu = c(posterior.slope.mean, posterior.int.mean), Sigma = Sigma, tol = 1e-6, empirical = FALSE, EISPACK = FALSE)
bivn.kde <- kde2d(bivarite.dist[,1], bivarite.dist[,2], n = 50)
contour(bivn.kde)
persp(bivn.kde, phi = 45, theta = 30, shade = .1, border = NA)

Gibbs Sampler

Gibbs Sampler In this blog we are going to discuss Gibbs Sampler with an applied example. But we will not dive into Gibbs Sampler direc...