At-Risk Customer Profiling: A Tool to Boost Customer Retention

September 10, 2012

Risk ProfilingOver the last several weeks, I have been writing about how to plan-out, set-up, and execute a one-time or recurring lost customer research project. In this series, I’ve discussed the importance of setting-up an automated lost customer survey to track trends in the reasons for these losses. This week, I will share how your company can use this information and other customer data tracked in your CRM for at-risk customer profiling, so that it can preemptively identify at-risk customers before they defect.

What is At-Risk Customer Profiling?

At-risk customer profiling is the process of analyzing patterns and trends among lost customers to identify the factors that are most likely to lead to customer defection so you can use these factors to create profiles of the customer groups that are most at-risk. This process is very similar to a customer segmentation project; however, the segmentation is focused on lost customers.

Why is At-Risk Customer Profiling Important?

  • The probability of winning back a customer’s loyalty prior to defection is much higher than the probability of winning back a lost customer.
  • The cost of preemptively winning back a customer’s loyalty is much cheaper in terms of human resource commitment and favorable contract concessions required to win back a lost customer.
  • At a macro-level, these factors should reduce churn, increase average lifetime value of customers, and help improve financial and human resource allocation.

Step-by-Step Guide on How to Do At-Risk Customer Profiling

  • To start, you will want to familiarize yourself with your customer segmentation. Often, the way your company thinks about new customer targets can provide you with some very valuable insights about lost customer segments, as well.
  • Next, you need to compile the data you will be analyzing. You will need the following data:
    • Lost customer data from a set time frame and the corresponding lost customer survey data.
    • Data on customer renewals over the same time frame.
  • In most instances, this data can be collected directly from your CRM.
  • After that, you will need to combine these data sets to create a master win-loss data set for a given time-frame and then merge in the lost customer survey data. In doing so, you will also need to create a binary variable that indicates wins as a 1 and losses as a 0.
  • Now you will want to make a copy and subset the combined data set to only the lost customers.
  • The next step is to review the lost customer survey data to identify trends in the reasons behind the recent customer losses to identify the most common characteristics and factors that were present at the time of the loss. To do so, you could use a segmentation tree to see the way these characteristics and factors group together to categorize lost customers. The goal of this exercise is to find the factors and groups of factors that are most likely to be present with your lost customers.
  • After that, you will want to think about why these factors could lead to customer losses to make sure that these factors truly are determinants of customer losses and not the results of spurious correlation.
  • Depending on the results of the previous step, you may need to remove the factors that are not logical determinants of these losses from the data and iterate on the segmentation tree and make new factor groups to categorize lost customers.
  • Now you will want to determine whether the presence of these groups of factors together is substantially higher in the lost customer group than the win group. You will also want to look at some variations of the factor groups and the individual factors to test its sensitivity. If they are more likely factors to be present in the lost customer group, then you have a lost customer profile hypothesis. You will want to develop a couple of these based on the segmentation tree results.
  • After that, you will want to test how well these factors will predict a customer loss. We can do this by running a logistic regression on the combined win and loss data set whereby the dependent variable is the binary customer type variable and the independent variables are all of the variables that you believe are driving the customer loss based on your lost customer profile factors. The R-squared results of this regression will tell you how likely this set of factors are to predict the customer losses in percentage terms and the significance statistic for each variable will determine if the variable is a determinant of customer loss.
  • Assuming these factors are all significant in the regression model, then you have your 1st at-risk customer profile.  If not, then you will need to iterate on the model by going back to the segmentation tree results and finding the next best factor group to test and evaluate.
  • Once you have your 1st at-risk profile, you should go back to the segmentation tree and find the 2nd most predictive at-risk customer profile. This process is the same as finding the first at-risk customer profile; you just need to exclude the 1st at-risk customer profile as a possibility. It is okay to have overlapping factors in these profiles. You will want to identify the top two to three profiles.

Taking it One-step Further: How to Build an At-Risk Customer Index

Companies can also take this one-step further and actually set-up an index that ranks the probability of customer defection based on the characteristics they determine are the drivers behind customer defection. This can be done using a logistic regression just as you did in validating the lost customer profiles, but this time you will want to include all the factors you believe lead to customer losses in the model along with any other control factors and use the customer type variable as the dependent variable again. You will want to run this model on the combined win and loss data set and see how effective it is at predicting a loss (R-squared). You will also want to check the model to make sure all factors are significant in the model. You will want to test it for statistical problems like heterogeneity from overlapping variables. If you find significance or statistical problems then you will want to remove one or more of the factors and re-run the model. You should continue to do this until you have a model that is good at predicting customer losses — all its independent variables are significant and none of the major statistical problems are present.

You can now take the logistic regression model and use it as the formula for the at-risk customer index. Next you can add an at-risk customer index field into your CRM that will notify the sales and customer service team of the at-risk probability of each customer they interact with.

Now you are ready to create at-risk customer profiles and/or an index to predict the likelihood of defection so that you can factor this information into your team’s sales and customer support resource allocation decisions. You will want to test your model on a regular basis to make sure that it is still a strong predictor of customer losses. I recommend looking at its predictive power on a three or six month basis. If the predictive power drops significantly from quarter to quarter then it is worth updating the model.

If you are not familiar with customer segmentation, I would recommend reviewing the OpenView Customer Segmentation eBook as it will walk you through the process of using segmentation trees.

Marketing Manager, Pricing Strategy

<strong>Brandon Hickie</strong> is Marketing Manager, Pricing Strategy at <a href="https://www.linkedin.com/">LinkedIn</a>. He previously worked at OpenView as Marketing Insights Manager. Prior to OpenView Brandon was an Associate in the competition practice at Charles River Associates where he focused on merger strategy, merger regulatory review, and antitrust litigation.