Welcome!: March 2018

Bayesian Simple Linear Regression

Hello, recently I finished the book, Introduction to Bayesian Statistics from William Bolstad. I liked the way Bolstad teaches Bayesian Statistics. There are some chapters that compare frequentist and bayesian approaches and their performances. I recommend it to everyone who want to learn about Bayesian Statistics.

Chapter 13 of the book is dedicated to Bayesian Simple Regression. This blog is nothing but an application of this chapter.

In simple regression, we have one predictor variable and one response variable. The simple linear regression line has two parameters which are slope and intercept. In Bayesian Statistics, those parameters can be handled like they are random which is different in frequentist view. In frequentist statistics, parameters are accepted as fixed and unknown parameters.

In this example, first we generate one response variable and one predictor variable. Distributions of those variables are shown in the figures. Those variables are generated using rnorm function. Distributions and histograms are below.

set.seed(35)
x = rnorm(10000,3,2)
y = rnorm(10000,10,5)
y = y + (3*rnorm(10000,4,1) - sqrt(3) + seq(from = 1, to = 100, by = ((100-1)/(10000-1))))
hist(x)

hist(y)

After drawing such distributions, sum of squares x, sum of squares y, sum of squares xy are calculated as follows.

SSy = sum((y - mean(y))^2)
SSy

## [1] 8520246

SSx = sum((x - mean(x))^2)
SSx

## [1] 40537.42

SSxy = sum((y - mean(y))*(x - mean(x)))
SSxy

## [1] -10540.16

After finding sum of square values, lets plot our data, x and y.

plot(x,y)

As can be seen from the plot, there is a high amoung of randomness in this data set. We cannot easily say that x is a good predictor value for y. This actually not a common way to work with such data since we try to learn about the parameter values, slope and intercept. Here we do not have such a relationship between x and y but the main point is updating our parameters and seeing that even if we use priors that we assigned not so wisely it converges to the original values.

Let’s quickly check how much effect does x has on y by doing linear regression in a frequentist way. R has a built in function called lm for that.

freq.lm = lm(y~x)
summary(freq.lm)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -63.293 -24.821   0.035  24.636  64.248 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  71.5284     0.5283 135.398   <2e-16 ***
## x            -0.2600     0.1450  -1.794   0.0729 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.19 on 9998 degrees of freedom
## Multiple R-squared:  0.0003217,  Adjusted R-squared:  0.0002217 
## F-statistic: 3.217 on 1 and 9998 DF,  p-value: 0.07291

freq.lm$coefficients

## (Intercept)           x 
##  71.5283801  -0.2600106

Since the relationship between x and y is quite random, we decided to use 0.1 for the prior mean and 0.5 for the prior standard deviation. For the interception, we use 71 for the prior mean and 1 for the prior standard deviation.

Using updating rule for Bayesian Simple Linear Regression (you can find it in more detail in Bolstad's book),

posterior.slope.var = 1 / ((1/0.5^2) + (SSx / 4))
posterior.slope.mean = (1 / (0.5)^2) / (1 / posterior.slope.var) * 0.1 + ((SSx/4) / (1 / posterior.slope.var)) * -0.26
posterior.slope.var

## [1] 9.863532e-05

posterior.slope.mean

## [1] -0.259858

The computed values, posterior.slope.mean and posterior.slope.var are posterior values for the slope parameter. -0.26 comes from SSxy / SSx, which is the least squares slope.

The next step is the computation of the posterior values for the interception.

posterior.int.var = 1 / ((1 / 1) + (10000/4))
posterior.int.mean = (1 / (1 / posterior.int.var)) * 71 + ((10000/4) / (1/posterior.int.var)) * mean(y)
posterior.int.var

## [1] 0.0003998401

posterior.int.mean

## [1] 70.73872

The computed values, posterior.int.mean and posterior.int.var are posterior values for the interception parameter. 71 comes from prior mean of the interception.

Let’s have a look at their distributions drawing 10000 samples from them.

Slope = rnorm(10000,posterior.slope.mean, sqrt(posterior.slope.var))
Intercept = rnorm(10000, posterior.int.mean, sqrt(posterior.int.var))
hist(Slope)

hist(Intercept)

Now, let’s create bivariate normal distribution of those two distribution. To do that, we need to create covariance matrix of those distributions.

library(MASS)
s = cov(Slope, Intercept)
Sigma = matrix(c(posterior.slope.var,s,s,posterior.int.var),2,2)
bivarite.dist = mvrnorm(n = 10000, mu = c(posterior.slope.mean, posterior.int.mean), Sigma = Sigma, tol = 1e-6, empirical = FALSE, EISPACK = FALSE)
bivn.kde <- kde2d(bivarite.dist[,1], bivarite.dist[,2], n = 50)
contour(bivn.kde)

persp(bivn.kde, phi = 45, theta = 30, shade = .1, border = NA)

Welcome!

Friday, March 23, 2018

Bayesian Regression with Normal Prior

Bayesian Regression with Normal Prior

Bayesian Simple Linear Regression

Gibbs Sampler

Report Abuse