Skip to content

Customer segmentation

Download notebook View source code

Customer segmentation is a common application of Data Science. We have an example deployment that segments customers based on an RFM analysis. We have two versions:

  • A simple version that takes an Excel file as input, and that returns a CSV file.
  • A more advanced version that reads the data from a Google Sheet, and writes the outcomes to the same Google sheet.

Simple RFM deployment

This deployment takes an excel file file as input, and returns an excel file segmentation. We have put the deployment.py here for your reference:

import os
import pandas as pd


class Deployment:

    def __init__(self, base_directory, context):

        print('Initialising the connection to the google drive')


    def request(self, data):

        print('Getting the requested file')
        # Transforming data into a Pandas DataFrame
        data_df = pd.read_excel(data['file'])

        # RFM analyis
        print('Performing RFM analysis')
        data_df['TotalPrice'] = data_df['Quantity'].astype(int) * data_df['UnitPrice'].astype(float)
        data_df['InvoiceDate'] = pd.to_datetime(data_df['InvoiceDate'])

        rfm= data_df.groupby('CustomerID').agg({'InvoiceDate': lambda date: (date.max() - date.min()).days,
                                                'InvoiceNo': lambda num: len(num),
                                                'TotalPrice': lambda price: price.sum()})

        # Change the name of columns
        rfm.columns=['recency','frequency','monetary']

        # Computing Quantile of RFM values
        rfm['recency'] = rfm['recency'].astype(int)
        rfm['r_quartile'] = pd.qcut(rfm['recency'].rank(method='first'), 4, ['1','2','3','4']).astype(int)
        rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1']).astype(int)
        rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1']).astype(int)

        # Concatenating the quantile numbers to get the RFM score
        rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) + rfm.m_quartile.astype(str)

        # Sort the outcomes
        print('Sorting customers')
        sorted_customers = rfm.sort_values('RFM_Score', ascending=True)
        sorted_customers.to_csv('segmentation_results.csv', index=True)

        return {
            'segmentation': 'segmentation_results.csv'
        }

In the request method we read in the Excel file as a DataFrame and we assign an RFM score to each customer based on how frequently, how recently and for how much money they purchased. The outcome is sorted and written to a csv, which we return as output.

Running this example in UbiOps

To deploy this example model to your own UbiOps environment you can log in to the WebApp and create a new deployment in the deployment tab. You will be prompted to fill in certain parameters, you can use the following:

Deployment configuration
Name rfm-analysis
Description A customer segmentation model based on RFM.
Input fields: name = file, datatype = file
Output fields: name = segmentation, datatype = file
Version name v1
Description leave blank
Environment Python 3.8
Upload code deployment zip do not unzip!
Advanced settings Leave on default settings

After uploading the code and with that creating the deployment version UbiOps will start deploying. Once you're deployment version is available you can make requests to it. For this example, you can use this example dataset from Kaggle.

Advanced RFM deployment

This deployment takes a filename as input, and returns nothing. Instead, the deployment retrieves data from a Google Sheet and writes the results of the analysis to that same sheet.

Google Sheets connection

For this deployment to work, you will need to have this dataset saved in your own Google Drive, and you will need a service account that has access to that sheet. For information on this see Pygsheet documentation.

The deployment.py for this more advanced version of the RFM analysis can be seen below.

import os
import json
from google.oauth2 import service_account
import pygsheets
import pandas as pd


class Deployment:

    def __init__(self, base_directory, context):

        print('Initialising the connection to the google drive')
        self.gc = None

        SCOPES = ('https://googleapis.com/auth/spreadsheets', 'https://googleapis.com/auth/drive')
        service_account_info = json.loads(os.environ['credentials'])
        my_credentials = service_account.Credentials.from_service_account_info(service_account_info, scopes=SCOPES)

        try:
            self.gc = pygsheets.authorize(custom_credentials=my_credentials)
            print('Established succesfull connection')
        except Exception as e:
            print('Connection failed, ', e.__class__, 'occurred.')


    def request(self, data):

        print('Getting the requested file')
        spreadsheet = self.gc.open(data['filename'])
        sheet_data = spreadsheet[0]

        # Transforming data into a Pandas DataFrame
        data_df = sheet_data.get_as_df()

        # RFM analyis
        print('Performing RFM analysis')
        data_df['TotalPrice'] = data_df['Quantity'].astype(int) * data_df['UnitPrice'].astype(float)
        data_df['InvoiceDate'] = pd.to_datetime(data_df['InvoiceDate'])

        rfm= data_df.groupby('CustomerID').agg({'InvoiceDate': lambda date: (date.max() - date.min()).days,
                                                'InvoiceNo': lambda num: len(num),
                                                'TotalPrice': lambda price: price.sum()})

        # Change the name of columns
        rfm.columns=['recency','frequency','monetary']

        # Computing Quantile of RFM values
        rfm['recency'] = rfm['recency'].astype(int)
        rfm['r_quartile'] = pd.qcut(rfm['recency'].rank(method='first'), 4, ['1','2','3','4']).astype(int)
        rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1']).astype(int)
        rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1']).astype(int)

        # Concatenating the quantile numbers to get the RFM score
        rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) + rfm.m_quartile.astype(str)

        # Sort the outcomes
        print('Sorting customers')
        sorted_customers = rfm.sort_values('RFM_Score', ascending=True)

        # Insert data into the google sheet in a separate tab
        print('Inserting data into the google sheet')
        sheet_title = "RFM_results"

        try:
            sh = spreadsheet.worksheet_by_title(sheet_title)
        except:
            print('Worksheet does not exist, adding new sheet')
            spreadsheet.add_worksheet(sheet_title)
            sh = spreadsheet.worksheet_by_title(sheet_title)
        finally:
            sh.set_dataframe(sorted_customers, 'A1', copy_index = True)
            sh.update_value('A1', 'CustomerID')
            print('Data inserted successfully')

        return None

In the __init__ method we set up a connection to the google drive where the sheet resides. In the request method we retrieve the Google Sheet as a DataFrame and we assign an RFM score to each customer based on how frequently, how recently and for how much money they purchased. The outcome is sorted and written to the same Google Sheet, in a separate tab called RFM_results.

Running this example in UbiOps

To deploy this example model to your own UbiOps environment you can log in to the WebApp and create a new deployment in the deployment tab. You will be prompted to fill in certain parameters, you can use the following:

Deployment configuration
Name rfm-analysis
Description A customer segmentation model based on RFM.
Input fields: name = filename, datatype = string
Output fields: leave blank
Version name v1
Description leave blank
Environment Python 3.8
Upload code deployment zip do not unzip!
Advanced settings Expand
Environment variables name = credentials, value = your JSON credential string

After uploading the code and with that creating the deployment version UbiOps will start deploying. Once you're deployment version is available you can make requests to it. For this example, you will need to use the name of your Google Sheet as input (e.g. OnlineRetail). Afterwards you should see a new tab in your Google Sheet with the results.