Published August 16, 2022

Using data science for customer retention

Customer retention is key priority for any business. Multiple factors drive customer churn and understanding of these factors can help proactive management of customer churn. Combination of data processing and statistics can help in understanding the possible reasons and identifying customers at risk.

For this example we have used telecom data set from IBM community website ( through a complex decision making process before subscribing to any one of the numerous Telecom service.

Goal of predictive model in this blog is to identify set of customers who have high probability of unsubscribing from the service. For this model, we are using personal details, demographic information, pricing, and plan information. We will also identify set of independent variables that are related to customer unsubscribing from service.

Data Description

  •  Dataset has 7043 rows with 21 features.
  •  Independent variables considered for this exercise
  •  Customer Demographics (Age, Gender, Marital Status, Location, etc,)
  •  Billing Information (monthly and yearly payment)
  •  Voice and data service (Phone service, multiple lines, Internet service, online security, device protection, Tech support)
  •  Contract type
  •  Bill payment mode
  •  Response/Dependent variable considered for the model:
  •  Value ‘1’ indicates UNSUBSCRIBED customers
  •  Value ‘0’ indicates ACTIVE customers

Source Code

Predictive Model

Logistic Regression:

For this exercise, we are using logistic regression algorithm. Logistic regression is useful in establishing a relationship between binary outcome and a group of continuous and/or categorical predictor variables. It also determines the percentage of variance in the dependent variable explained by independent variable.

2 Comments on using data science for customer retention

    September 27, 2022


    September 27, 2022



Leave A Comment