Friday, July 21, 2017

Customer Analysis : RFM Scoring with R



Hello! In this blog, I want to write about some basic topics on Customer Analysis. I am currently reading The Power of Habit from Charles Duhigg and I read a great example of customer analysis on that book. It is about the company 'Target' and its statisticians ability to detect pregnant customers so that they can send coupons to them accordingly. They are doing this by analyzing people's shopping patterns. And once they recognize who is pregnant and the approximate the time when they are going to give birth, they can win those customers and make them buy their stuff from Target. I found that fantastic. I cannot do such analysis by now but it made me curious about the topic. Later on I learned about RFM scoring and some other basic types of analysis. I want to do an example of RFM in this blog.

Let's load our data. It can be downloaded from here.

> customerDf <- read.table("CDNOW_sample.txt")
> customerDf <- customerDf[,-2]
#Second column is their id in sample data set. we can delete it.
> colnames(customerDf) <- c("ID", "Date", "Unit", "Total Amount")
> class(customerDf$Date)
[1] "integer"
#We need to make it Date

> customerDf[,2] <- as.Date(as.character(customerDf[,2]),"%Y%m%d")

Here is our data. What we can do with those parameters? One of the possible answers is RFM scoring. RFM stands for Recency, Frequency and Monetary. Recency refers to How recently did the customer purchase? Once we decide on a reference date, recency is equal to the day difference between our reference day and last time of shopping from given customer. Frequency is simply How often do they purchase in that interval. Monetary means How much do they spend?

In RFM scoring, we compute those RFM values and then we score them. We score according to our
expectancies and average sales we are making. This scoring is usually in interval of [1,5], 5 is the highest score. So 555 scored customer will be our best customer. In other words, it is a customer value metric. I will give some further sources at the end of the blog.


Let's start with computing our Recency value.


> refDay  <- c("19980101")
> refDay <-as.Date(refDay,"%Y%m%d")
> df <- customerDf[customerDf$Date<refDay,]
> min(customerDf$Date)
[1] "1997-01-01"

The time interval is decided. Since very first data from customers come from 1997-01-01, it will be from 1997-01-01 to 1998-01-01.

> uniqCust <- unique(df$ID)
> dateDf <- rep(NA,length(uniqCust))
> for(i in 1:length(uniqCust)){
+     dateDf[i] <- max(df[df$ID==uniqCust[i],2])
+ }

> head(dateDf)
[1] 10207  9874  9862  9862  9862 10201
#Here we have customers' last shopping days in numeric type.
#We can compute the recency by simply subtracting it from our reference day.
> recency = as.numeric(refDay) - dateDf
> head(recency)

[120 353 365 365 365  26

Frequency is just how many times they shopped. 


> frequency <- data.frame(table(customerDf$ID))
> colnames(frequency) <- c("ID", "Frequency")
> head(frequency)
ID Frequency
1  4          4
2 18         1
3 21         2
4 50         1
5 60         1
6 71         1

Monetary value;

> revenue <- rep(0, length(uniqCust))
> for(j in 1:length(uniqCust)){
  +     for(i in 1:length(df$ID)){
    +         if(uniqCust[j] == df$ID[i]){
      +             revenue[j] = revenue[j] + df$`Total Amount`[i]
      +         }
    +     }
  + }
> head(revenue)
[1] 100.50  75.11   6.79  13.97  23.94 714.12

We have computed the necessary values. Now we are ready for scoring.


> rfmTable <- cbind.data.frame(uniqCust,recency,frequency$Frequency,revenue)
> head(rfmTable)
 uniqCust recency frequency$Frequency revenue
1        4        20                   4                    100.50
2       21     353                   1                      75.11
3       50     365                   2                        6.79
4       71     365                   1                      13.97
5       86     365                   1                      23.94
6      111      26                    1                    714.12

Let's start scoring. Last shopping from 5 days before the reference day will be given the best score, 5. It will continue like 28, 84, 168, 366.


> rankR <- cut(as.numeric(rfmTable$recency), breaks = c(0,5,28,84,168,366))
> levels(rankR) <- c(5,4,3,2,1)

In case of frequency, 0 to 3 will have the worst score, 1. (3,4] will have 2, (4,7] will have 3, (7,9] will have 4 and more than 9 will have 5 points.

> rankF <- cut(rfmTable$`frequency$Frequency`, breaks = c(0,3,4,7,9,10000))
> levels(rankF) <- c(1,2,3,4,5)

Lastly, for monetary value, intervals will be like (0, 99], (99, 299], (299,599], (599,1000], (1000,1000000].

> rankM <- cut(rfmTable$revenue, breaks = c(0,99,299,599,1000,1000000))
> levels(rankM) <- c(1,2,3,4,5)

Now we are ready to construct our RFM score matrix.

> rfmScores <- cbind(uniqCust, rankR, rankF, rankM)
> head(rfmScores)
uniqCust rankR rankF rankM
[1,]        4     2     2     2
[2,]       21     5     1     1
[3,]       50     5     1     1
[4,]       71     5     1     1
[5,]       86     5     1     1
[6,]      111     2     1     4

Now since we have such scores, we know our customers better. We can take action accordingly. For instance, if we have 353 scored customers it would be wise to send them some coupons since they shopped from us very frequently but not that recently.

No comments:

Post a Comment

Gibbs Sampler

Gibbs Sampler In this blog we are going to discuss Gibbs Sampler with an applied example. But we will not dive into Gibbs Sampler direc...