Optimizing Direct Mail Marketing for Tayko Software Cataloger through Predictive Analytics and Data Mining
For the code, please visit: github.com/AhmedEAbdou/Maximizing-Direct-Mail-Marketing-Profitability-Using-R
Objective:
The primary objective of this project was to develop data-driven models that would help Tayko Software Cataloger, a software retail firm, optimize its mailing strategy. The aim was to maximize the gross profit generated by its new catalog mailing to 180,000 potential customers drawn from a consortium pool of 5,000,000 names.
Approach: The dataset provided for this project contained 2,000 records, including 1,000 purchasers and 1,000 non-purchasers, along with various customer attributes. The analysis involved the following steps:
-
Estimation of gross profit for a random selection of 180,000 names.
-
Development of a binary classification model to predict whether a customer would make a purchase or not.
-
Development of a regression model to predict the spending among purchasers.
-
Evaluation of the expected profit based on the data mining models.
Data Pre-processing and Partitioning:
The data was first partitioned into training (800 records), validation (700 records), and test (500 records) sets. This enabled the development and evaluation of models on different subsets of data to avoid overfitting and ensure model generalizability.
Binary Classification Model: A stepwise logistic regression model with backward elimination was used to classify customers as purchasers or non-purchasers based on the training set. Logistic regression was chosen because it provided an estimated "probability of purchase," which was required for subsequent analysis.
Regression Model for Spending Prediction:
The following models were developed for predicting spending among purchasers: i. Multiple linear regression (using stepwise regression) ii. Regression trees
The model's performance on the validation dataset was used to select the best model for predicting spending.
Score Analysis and Lift Chart:
The test dataset, which included both purchasers and non-purchasers, was used to create a new data frame called "Score Analysis." The following columns were added to this data frame: a. Predicted scores from the logistic regression model b. Predicted spending amount from the chosen prediction model c. Adjusted probability of purchase (predicted probability of purchase multiplied by 0.107 to adjust for oversampling of purchasers) d. Expected spending (adjusted probability of purchase multiplied by predicted spending)
A lift chart of the expected spending was plotted, which was used to estimate the gross profit resulting from mailing the 180,000 names based on the data mining models.
Results:
The analysis successfully developed two predictive models for Tayko's mailing strategy: a binary classification model for predicting the likelihood of purchase and a regression model for predicting spending among purchasers. By leveraging these models, Tayko can optimize its mailing strategy and maximize the gross profit generated from the new catalog mailings.
These data-driven models provide valuable insights for Tayko Software Cataloger, allowing the company to make informed decisions in targeting potential customers and increasing profitability. This project demonstrates the power of data mining and predictive analytics in optimizing marketing strategies for businesses.