comment 1

Predicting Customer Churn in the Telecommunications Industry: An Application of Predictive Modeling and Survival Analysis

 

By: Srisai Sivakumar

Motivation:

According to a recent study, the worldwide telecom industry is set to be valued at close to 1.4 trillion Euros by 2017. The largest market in segment is Asia/Pacific. This trend is largely driven by 2 emerging markets, India and China. India’s telecommunication network is the second largest in the world, based on the total number of telephone users (both fixed and mobile phone). It has one of the lowest call tariffs in the world, which has largely been due to the stiff competition among the service providers. With so many providers and incentives to switch to a new one, customers are literally spoilt for choice. Given the fact that the Indian telecommunications industry experiences an average of 10-12 percent annual churn rate and the fact that it costs much more to gain a new customer than to retain an existing one, customer retention has now become atleast as important as, if not more than customer acquisition. For many incumbent operators, retaining high profitable customers is a major challenge.

Objective

The objectives of this study are two folds. The first is to predict if a customer is likely to discontinue using the services of a provider using learning methods and understand the factors that drives them to their decisions. The second objective is to understand the customer survival function to gain knowledge of customer churn over the time of customer tenure. This will also help identify the customers who are at high risk of churn and its timing. In other words, we are trying to find out why and when will churn occur.

The data

A sample of 3333 active customers was randomly selected from the entire customer base from a the state-owned telecommunications company. From an earlier study, the company has arrived at a conclusion the customer demographic and socio-economic factors werent significant predictors of churn. Thus upon the request of the company, this study shall focus only on the ‘call related’ parameters of fixed line phones, like call duration during various times of the day, call costs, number of calls to customer care, etc. The study assumes the data, as provided by the company, is clean and reliable. Since this was the first service provider in the country, one might expect some customers to have been with them for unusually long time in terms of telecom customers.

The analysis

Exploratory Analysis

Before we start any sort of analysis, its important to understand the current prevalence of churn.

## 
## False  True 
##  2850   483
## 
## False  True 
##  0.86  0.14

So for every 100 customers in the sample, 14 opt to switch to other providers and 86 tend to continue with the same company. So any machine learning model we develop, should give an accuracy of more than 86%, and should capture the churn customers well. Lets begin by looking at the call duration at various times of the day

plot of chunk exp_plot_1

There are a few things to note in this plot. We see that for the blue points(non-churn customer), there is almost negligible correlation between the total minutes in the day and evening. We see higher concentration of magenta (churn customers) dots on the top-right side of the plot. This seem to suggest that more the total day minutes, higher the likelihood of churn. This poses questions like

  1. Why is the night minutes not seem to impact churn?
  2. What about the day time calls that could push the customers to switch providers?

A factor that could shed light on this is the number of customer care calls made. Lets create a similar plot of total day and night minutes, but conditioned over the number of calls to customer care to see if we spot a trend.

plot of chunk exp_plot_2

plot of chunk exp_plot_2

Sure enough, we start seeing signs of the cause. We again see general bunching of magenta dots in the top-right corner of the plots, suggesting the same trend we observed previously. This plot additionally reveals that with increasing number of calls the call duration decreases! Of course, it highly likely that in reality, it happens in the reverse order, i.e. the customer is unable to have long calls because of ‘some’ problem and they call the customer care. Not much can be inferred from the top 4 sub-plots of both plots, where the number of customer calls is greater than 6, which is a high number of calls to customer care. But this happens relatively infrequently, so the top 4 subplots representing greater than 5 calls to customer care shoud not be used to infer anything meaningful. To confirm, lets tabulate the number of calls to customer care.

## 
##    0    1    2    3    4    5    6    7    8    9 
##  697 1181  759  429  166   66   22    9    2    2
## 
##    0    1    2    3    4    5    6    7    8    9 
## 0.21 0.35 0.23 0.13 0.05 0.02 0.01 0.00 0.00 0.00

So with very few samples for more than 5 customer care calls, the top 4 subplots of the previous plot ought to be ignored. Lets now look into the evening and night minutes.

plot of chunk exp_plot_4

Its interesting to observe that only the day time call pattern shows noticeable difference bewteen the churn and non-churn customers. This could be suggesting that the day minutes is a certain significant predictor, while the evening and night minutes may not be as significant predictors as day minutes. A look at the quantiles of the day, evening and night minutes reveal that day time has the lowest total call duration.

## [1] "Total day minutes summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   143.7   179.4   179.8   216.4   350.8
## [1] "Total evening minutes summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   166.6   201.4   201.0   235.3   363.7
## [1] "Total night minutes summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    23.2   167.0   201.2   200.9   235.3   395.0
## [1] "Call Cost summary at various times of the day"
## [1] "Total day charges summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   24.43   30.50   30.56   36.79   59.64
## [1] "Total evening charges summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   14.16   17.12   17.08   20.00   30.91
## [1] "Total night charges summary"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.040   7.520   9.050   9.039  10.590  17.770

Its clear that the total day call duration is lower than that of evening and night. And the day call charges are higher than that of evening and night. This might illustrate why the day time total call duration is an important factor for churn. It may well be that the customers get high charges and are not willing to put up with it. It could indicate that despite having lower total call duration that evening and night, day time call service seem to be adversely contributing to churn. This could be because of the poor call quality during day time or higher tariffs.

Lets turn out attention to international minutes and international plans.

plot of chunk exp_plot_5

As expected, customers with some international plan have more international minutes, but interestingly, the longer they spoke, the higher the chances of churn. Lets look at it in a bit more detail by adding the number of calls to customer care.

plot of chunk exp_plot_6

So clearly, international plan and minutes are having an effect on churn.

##        international.plan
## churn     no  yes
##   False 0.80 0.06
##   True  0.10 0.04

The table shows that of the people without any international call plans, ~10% of the customers leave, while considering those with international plan, almost 50% choose to leave. So its highly probable that the internation plan might not be attractive for customers requiring long overseas phone calls. Or the service of international calls might be poor, forcing customers to leave.

Lets do a similar but quick look at the voice mail plan.

##        voice.mail.plan
## churn     no  yes
##   False 0.60 0.25
##   True  0.12 0.02

This shows that the voice mail plan seems to be very popular among the customers, with only 7% of the customers with the said plan leave, while close to 17% of the customers without voice plan leave. This could suggest that the company provides a reasonably good voice mail service, but for some reason not being opted by large proportion of customers. May be the customers are unaware of the voile mail plans or the plan costs high, not attracting customers.

plot of chunk exp_plot_9

Its clear that the duration of the account doesnt have much effect on the churn. It could suggest that the customers arent hesitant to switch providers, indicating that the company is lacking loyalty from its customers. May the some loyalty schemes could boost the chances of retaining existing customers.

A quick summary of the findings of the exploratory analysis:
1. Day time call duration is an important factor affecting churn
2. Day time call charges could be high leading to churn
3. Number of customer care calls also important to churn
4. Evening and night calls may not be significant predictors of churn
5. International call plan may not be attractive to customers, leading to churn.
6. Voice mail plan seems to be popular with the customers.
7. Customer’s duration of association with the company doesnt prevent them from churn. Potentially due to lack of reward or loyalty plans.

Predictive Models

It has already been noted that 86% of the customers in the sampled data remain with the company, while the remaining 14% switch to other providers. If we naively assume that all the customers remain with the company and there is no churn, the accuracy is 86%. Any model we develop should better this by good margin. We use a 70/30 split of the data into training and test set. We build the model on the training set and test it on the test set. We use Stochastic Gradient Boosting as the preferred learning algorithm for this study.

Below are the summary and performance of the model on the test set

## Stochastic Gradient Boosting 
## 
## 2334 samples
##   17 predictor
##    2 classes: 'False', 'True' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 2334, 2334, 2334, 2334, 2334, 2334, ... 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  Accuracy   Kappa      Accuracy SD
##   1                   50      0.8622710  0.2095872  0.009571098
##   1                  100      0.8666280  0.2900729  0.010766040
##   1                  150      0.8689696  0.3234093  0.011562494
##   2                   50      0.9037362  0.5123320  0.010804340
##   2                  100      0.9201596  0.6256160  0.009323278
##   2                  150      0.9249685  0.6547530  0.007327744
##   3                   50      0.9341107  0.6923798  0.009643524
##   3                  100      0.9400275  0.7288910  0.006678648
##   3                  150      0.9413447  0.7375729  0.006525473
##   Kappa SD  
##   0.03675977
##   0.03619469
##   0.04290857
##   0.04611554
##   0.03814560
##   0.03286057
##   0.04268671
##   0.03084071
##   0.03130478
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using  the largest value.
## The final values used for the model were n.trees = 150,
##  interaction.depth = 3, shrinkage = 0.1 and n.minobsinnode = 10.
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction False True
##      False   845   10
##      True     32  112
##                                           
##                Accuracy : 0.958           
##                  95% CI : (0.9436, 0.9695)
##     No Information Rate : 0.8779          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.818           
##  Mcnemar's Test P-Value : 0.001194        
##                                           
##             Sensitivity : 0.9635          
##             Specificity : 0.9180          
##          Pos Pred Value : 0.9883          
##          Neg Pred Value : 0.7778          
##              Prevalence : 0.8779          
##          Detection Rate : 0.8458          
##    Detection Prevalence : 0.8559          
##       Balanced Accuracy : 0.9408          
##                                           
##        'Positive' Class : False           
## 
## Accuracy 
##     0.96

plot of chunk caret

This re-affirms the findings of the exploratory analysis. Clearly the most important factor causing churn is total daytime minutes. We have already speculated the probable causes for that, so in this section, we shall abstain from exploring such variables any further. As expected, number of calls to customer care, total international minutes, international plan and voice mail plans have all featured as prominent predictors of churn. he duration of account with the company has little effect on churn- again, this is something we have already observed. A notable exception from the observations of the exploratory analysis the the boosting model’s prediction of total evening minutes being an important predictor of churn. In the exploratory section, we concluded that the evening minutes might not be an important or significant predictor of churn.

Survival Analysis

Conventional statistical methods (e.g. logistics regression, decision tree, and etc.) are very successful in predicting the causes for customer churn. However, these methods cannot predict when customers will churn, or how long the customers will stay with. We use survival analysis for that. Survival analysis is a branch of statistics that deals with analysis of time duration until one or more events happen, such as death in biological organisms and failure in mechanical systems. In our case, we analyze the time duration of customers with the current service provider and use survival analysis as an efficient and powerful tool to predict customer churn.

Survival data have two common features that are difficult to handle with conventional statistical methods: censoring and time-dependent covariates. Generally, survival function and hazard function are used to describe the status of customer survival during the tenure of observation. The survival function gives the probability of surviving beyond a certain time point t. The hazard function describes the risk of event (in this case, customer churn) in an interval time after time t, conditional on the customer already survived to time t.

For survival analysis, the best observation plan is prospective. We begin observing a set of customers at some well-defined point of time (called the origin of time) and then follow them for some substantial period of time, recording the times at which customer churns occur. It’s not necessary that every customer experience churn (customers who are yet to experience churn are called censored cases, while those customers who already churned are called observed cases). Typically, not only do we predict the timing of customer churn, we also want to analyze how time-dependent covariates (e.g. customers calls to service centers, customers change plan types, customers change billing options, and etc.) impact the occurrence and timing of customer churn.

For our purposes, we use the account duration as our time variable. For the dependent covariates, we use the top 10 predictors of churn from the boosting model.

plot of chunk surv1

Lets try to interpret the plot. When the time (account duration) is 0, we have all 100% of the customers survive. As the time increases, we see that the surviving probability decreases. This is intuitive, isnt it? To make this plot more informative, it would be helpful to take a parameter directly causing churn and vary it to see the effect on survival. Lets consider the number of calls to customer care as the variable ad analyze the effect of increases in it on the survival.

Lets evaluate the survival probabilties of survival when we have atleast 1, 2, 3, 4 and 5 calls to customer service.

plot of chunk surv_func

We clearly see the intuitively obvious trend of decreasing survival probabilities with increasing calls to customer care. This shows the effect of the number of calls to customer service on survival.

Lets now look at the hazard plot. The hazard function describes the risk of customer churn in an interval time after time t, conditional on the customer already survived to time t.

plot of chunk hazard

This shows that the the first 50 months with the company is relatively hazard free. The hazards or risks of churn gradually increases after account duration of 100 months.

Results

The objective of the study was to predict why and when will customer churn happen. The process involved exploratory analysis to visually garner preliminary assessments of the contribution of variables to churn and analyzing overall trends, if any. We then developed learning models to predict if a customer is likely to leave the service based on predominantly fixed-line calls related data. The model had close to 96% accuracy and predicted the following factors to be important (in decreasing orcer of importance) to churn.

## gbm variable importance
## 
##                               Overall
## total.day.minutes             221.076
## number.customer.service.calls  92.420
## total.eve.minutes              91.117
## total.intl.calls               62.701
## total.intl.minutes             61.142
## international.plan             59.802
## number.vmail.messages          38.778
## voice.mail.plan                15.085
## total.night.minutes            11.464
## total.night.calls               9.226
## total.day.calls                 7.154
## account.length                  5.537
## total.night.charge              3.652
## total.eve.calls                 3.338
## total.eve.charge                1.825
## total.day.charge                0.000
## total.intl.charge               0.000

We also assessed the survival probability of customers and also its sensitivity to a particular variable (number of calls to customer service). We also examined the probability of hazard (churn) over time and observed phases of almost no hazard and phases of relatively high risks.

Conclusion

Based on all the results, the following recommendations can be made.

  • There is a strong correlation between higher total day minutes and churn. This could be due to a variety of factors, couple of which are given below:
    1. higher day time call costs compared to competition. The company could reduce the day time tariffs or provide value added services
    to customers.
    
    2. poor call quality leading to more calls to customer service. The company may choose to decrease the cost to see if it helps or
    improve the call quality, which might need (often costly) infrastructure upgrades.
    
    3. Unergonomic phone instrument that makes it uncomfortable for customers to have prolonged calls.
    
  • Almost half the customers with international plan end up switching to other providers. Again, this could be due to a variety of factors, but the factors suggested above applies to this as well.
  • The voice mail plan seem to be popular with the customers. But not ‘enough’ customers seem to have opted for it. Some probably causes for this are:
    1. unawareness among customers about the plan. Timely and appropriate marketing methods could make a positive difference.
    
    2. high cost resulting in customers not choosing it despite the supposed popularity. Reducing cost or providing value added
    services could help.
    
  • Customer’s account duration doesn’t seem to affect churn. This could be because of:
    1. lack of loyalty schemes (like the Tesco club card, etc). This can be addressed by introducing loyalty schemes or revamping the
    existing ones.
    
    2. lack of strategically timed customer specific marketing and offers: keep better track of customers and introduce customer        
    specific offers.
    
    3. lack of overall positive customer sentiment on the brand. Can be addressed by proper advertising and marketing and in other 
    cases, going upto involving in (expensive, but rewarding) philanthropic or social welfare activities.
    
  • The company should strive to limit the number of customer care calls to 1, to have a 80% survival probability after 150 months.
  • The risks of churn increases after 100 months of being with the company. A strategic and customer specific marketing and promotions at this time frame could help retain the customer longer.

Closing remarks

This study illustrates the use of machine learning and survival analysis to predict the factors contributing to customer churn and when is it likely to occur. The findings of this study will help the telecommunications company understand customer churn risk and customer churn hazard over the time of customer tenure. This results of the study would be helpful in customizing marketing communications and customer treatment programs to optimally time their marketing efforts.

1 Comment so far

Leave a Reply