How to build and implement a recommendation system from scratch (in Python)

By 17 March 2021October 12th, 2021Blog

Isn’t it nice that almost any web shop provides suggestions for other products you might like? When I click on a book while shopping at bol.com, immediately other recommendations start filling up the lower part of my screen. These recommendations are often powered by an algorithm running in the background which looks at the previous shopping behavior of you and other customers. We are already so used to these recommendations to pop up that it feels almost weird when they are not there right? In this article we will show you how to create a recommender model using Apriori and integrate it in a WebApp, using UbiOps to serve the model. The WebApp we use is built with React, but the implementation would be similar for other frameworks. You can follow along to try it for yourself.

Link to recommender model source code

Link to WebApp source code

Developing a recommender model

There are different ways of developing a recommender model, but in this article we will use association rule mining. Association rule mining is a technique to identify underlying relations between different items. Take for example a supermarket where customers can buy a variety of items. Usually, there is a pattern in what customers buy. Mothers with babies buy baby products such as milk and diapers, and students might often buy beer and chips and so on. In short, transactions follow a pattern. The process of identifying these associations between products often bought together is called association rule mining.

 

Apriori algorithm

Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm. The Apriori algorithm is based on three key concepts:

 

  • Support
  • Confidence
  • Lift

Support refers to the default popularity of an item and can be calculated by finding the number of transactions containing a particular item divided by the total number of transactions.

Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by the total number of transactions where A is bought.

Lift refers to the increase in the ratio of sale of a product B when another product A is sold. `Lift(A –> B)` can be calculated by dividing Confidence(A -> B) by Support(B).

For large sets of data, there can be hundreds of items in hundreds of thousands of transactions. The Apriori algorithm tries to extract rules for each possible combination of items. We can set minimum levels of support, confidence or lift to be reached to count something as a “good rule”. The algorithm will then only return these rules fitting our requirements.

 

Implementing Apriori using Python

All fun and games how it works in theory, but let us take a look at how the Apriori algorithm can be implemented in Python for an actual use case. In this section we will use the Apriori algorithm to find rules that describe associations between different products given 7500 transactions over the course of a week at some retail store. Using these association rules, we will set up a dictionary that will contain three recommended items to also look at, per item in the dataset. The dataset can be found here and the source code here.

There is a Python library called Apyori which we can use to implement the Apriori easily, without having to calculate the support, confidence and lift ourselves. You can install Apyori using `pip install apyori`. Please make sure you install Apyori before proceeding.

 

Requirements

First, import the necessary libraries:

import numpy as np
import pandas as pd
from apyori import apriori
import pickle

 

Loading and preprocessing data

Now let’s load the dataset into a pandas DataFrame and use the Pandas `head()` function to see what we’re working with. I also pass the `header=None` since our dataset does not have any header.

store_data = pd.read_csv('store_data.csv', header=None) 
store_data.head()

 

 

Do not be afraid of all the NaN values we see here, it’s a result of how the dataset is constructed. Our dataset contains transaction history, with every row indicating a new transaction. Transactions are not consistent in size. Some people buy 4 items, others 20 and therefore our rows differ in length. Our DataFrame corrects for that by filling empty spots with NaNs and taking as row size the length of the biggest transaction.

The Apriori library we are using requires our data to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is a sublist. This looks something like `big_list = [[transaction1_list], [transaction2_list],..]`. To transform our pandas DataFrame into a list of lists we use the following code snippet:

 

df_shape = store_data.shape
n_of_transactions = df_shape[0]
n_of_products = df_shape[1]
# Converting our dataframe into a list of lists for Apriori algorithm
records = []
for i in range(0, n_of_transactions):
    records.append([])
    for j in range(0, n_of_products):
        if (str(store_data.values[i,j]) != 'nan'):
            records[i].append(str(store_data.values[i,j]))
        else :
            Continue

 

Applying the Apriori algorithm

Our data is in the correct format to be fed to the apriori class of the Apyori library. The apriori class requires some additional parameters to work.

 

  • records: our data in list of lists format
  • min_support: minimum support values required for every association rule
  • min_confidence: minimum confidence value required for every association rule
  • min_lift: minimum lift value required for every association rule
  • max_length: maximum number of items you want per rule

We use min_support = 0.0045, min_confidence = 0.2, min_lift = 2, max_length = 5, which we found after some trial and error. Now let’s call apriori:

association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=2, max_length=5) association_results = list(association_rules)

Apriori will find 142 association rules that fall within our desired ranges. We can take a look at what rules were found by Apriori.

for item in association_results:

    # First index of the inner list contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
   to_print = "Rule: "

    arrow = " -> "
    for i in range(len(items)):
        to_print += str(items[i]) + arrow

    print(to_print)
    # Print the support for this association rule
    print("Support: " + str(item[1]))
    # Print the confidence and lift for this association rule
    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Your output should look something like this:

 

Now that we have our association rules we need to use that to build up “recommendation rules”. What we want is in essence a lookup table, in which we can look up a certain product, and find three associated products that a buyer might also be interested in. However, not every association rule gives us three items that are frequently bought with the base item, sometimes only two are returned. To make sure that for every product we can return three recommendations, we will recommend the overall most frequently bought products to fill up the gaps. To do so, we will first have to rank all the products based on how frequently they appear in purchases in our dataset.

# Get all the products listed in dataset

# First merge all the columns of the data frame to a data series object

merged = store_data[0]

for i in range(1,n_of_products):

    merged = merged.append(store_data[i])

# Then rank all the unique products

ranking = merged.value_counts(ascending=False)

# Extract the products in order without their respective count

ranked_products = list(ranking.index)

Now that we have a ranking of the products, and the association rules found by Apriori, we can set up our recommendation rules. We can modify the printing loop we made earlier for this purpose.

 

lookup_table = {}

for item in association_results:
    # First index of the inner list contains base item and add item

    pair = item[0

    items = [x for x in pair]

    to_print = "Rule: "

    arrow = " -> "

    for i in range(len(items)):

        to_print += str(items[i]) + arrow        

    # If we do not have 3 recommendations for our base product we will

    # suggest top ranked products in addition

    if len(items) < 4:

        items_to_append = items

        i = 0

        while len(items) < 4:

            if ranked_products[i] not in items:

                items_to_append.append(ranked_products[i])

            i += 1

    # Add the items to db, with base product separately from the products 

    # that are to be recommended

    Lookup_table[items_to_append[0]] = items_to_append[1:]

    print(to_print)

    # Print the support for this association rule

    print("Support: " + str(item[1]))

    # Print the confidence and lift for this association rule

    print("Confidence: " + str(item[2][0][2]))

    print("Lift: " + str(item[2][0][3]))

    print("=====================================")

In the code we check for every association rule if it is already of length 4 (1 base item plus 3 recommendations) or not. If not, we append items from our ranked product list. Finally, we add the recommendation rules to a dictionary `lookup_table`.

Unfortunately our lookup_table does not contain recommendations for all products, since not an association rule was found for every product. In case we don’t have a recommendation, the top 3 most frequently bought items need to be suggested. Therefore we need an additional entry in our table:

lookup_table['default_recommendation'] = ranked_products[:3]

 

Deploying the recommender model to UbiOps

We now have a lookup table that returns three recommendations for every item in the store, but the recommendations still need to be put behind our webshop. To do so, we first need to deploy and serve our model. In other words: we need to bring it live. In this article we use UbiOps for that.

We start by importing the necessary libraries:

import shutil 
import os 
import ubiops 
import pickle

 

Establish a connection with UbiOps

To establish a connection with UbiOps you need a UbiOps account (a free one will do) and an API token with project-editor rights, which can be generated via the WebApp. To set up the connection we use the following lines of code:

API_TOKEN = '<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>' # Make sure this is in the format "Token token-code"

PROJECT_NAME= '<INSERT PROJECT NAME IN YOUR ACCOUNT>'

DEPLOYMENT_NAME='recommender-model'

DEPLOYMENT_VERSION='v1'

client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN})

api = ubiops.CoreApi(client)

 

Create the recommender model

Our recommender model is relatively straightforward, it takes a “clicked_product” as input, and returns a list of 3 products as output. The clicked product refers to when a person clicks on a product in our webshop. To achieve this functionality in a deployment in UbiOps we need the following deployment.py:

 

As you can see, when the deployment is initialized it loads our lookup_table, and when a request is made, it looks for recommendations.

We then put this deployment.py in a deployment package together with our lookup_table and a requirements.txt describing our dependencies. I pickled the lookup_table to easily add it to a deployment package. You can find the complete deployment package here.

 

Exposing the model via an API endpoint

To generate an API endpoint for the model we can deploy it to UbiOps. You can do so via your notebook using:

 

# Set up deployment template

deployment_template = ubiops.DeploymentCreate(

    name=DEPLOYMENT_NAME,
    description='Recommends other products to look at based on clicked product',
    input_type='structured',
    output_type='structured',
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name='clicked_product',
            data_type='string',
        )
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name='recommendation',
            data_type='array_string'
        )
    ],
    labels={'demo': 'recommender-system'}
)
api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.VersionCreate(
    version=DEPLOYMENT_VERSION,
    language='python3.8',
)
api.versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package

shutil.make_archive('recommender_deployment_package', 'zip', '.', 'recommender_deployment_package')

# Upload the zipped deployment package

file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file='recommender_deployment_package.zip'
)

Make sure that you have a copy of the deployment package in your working directory to make this code work.

Once UbiOps has finished building the deployment, its API endpoint will be available at

https://api.ubiops.com/v2.1/projects/<your-project-name>/deployments/recommender-model/versions/v1/request

 

Putting the recommender model behind a WebApp

Now that your API endpoint is ready to use, you can integrate your recommender model into your own WebApp. To help you with that, we created a WebApp ourselves as an example. You can find it in this GitHub repository. And the live application is running here. The recommendation model is called to recommend other products to the user in the product details page. We used ReactJS but the piece of code that calls the UbiOps API can be used with any JavaScript library or framework.

Here are the steps to follow:

 

1. First, you have to define all the parameters you will need to call the API endpoint:

const API_TOKEN = process.env.REACT_APP_API_TOKEN;
const API_URL = process.env.REACT_APP_API_URL;
const PROJECT_NAME = process.env.REACT_APP_PROJECT_NAME;
const DEPLOYMENT_NAME = process.env.REACT_APP_DEPLOYMENT_NAME;
const DEPLOYMENT_VERSION = process.env.REACT_APP_DEPLOYMENT_VERSION;

The project name, deployment name and version name are as defined in the previous section. As for the API token, you can create it from our WebApp. In our case, the API token should only be allowed to create requests in order to prevent any misuse. To do that, you can create a custom role in the “Roles” tab of the “Permissions” section and call it “deployment-request”. Select “deployments.versions.requests.create” as only permission.

Once your role has been created, you can create and assign it to an API token. Create a new token from the same “Permissions section”, and under allowed domains set the domain of your webapp (e.g. subdomain.example.com or localhost:3000). Next, assign the “deployment-request” role to the token. Copy the token and set it as the REACT_APP_API_TOKEN. For more information about API tokens and Roles, check out our documentation on service users.

 

2. Then define an asynchronous function to fetch your data from the UbiOps API.

First, pass it the endpoint you want to call, which is the request endpoint in our case (see below). Then, pass the JSON data sent along with the POST request as the second argument. We’ll call this function “postRequest”:

async function postRequest(url = "", data = {}) { 
const response = await fetch(API_URL + url, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: API_TOKEN,
},
body: JSON.stringify(data),
});
return response.json();
}

 

3. And finally, call your API endpoint:

postRequest(
`/projects/${PROJECT_NAME}/deployments/${DEPLOYMENT_NAME}/versions/${DEPLOYMENT_VERSION}/request?timeout=3600`,
{ clicked_product: product }
).then((response) => setRecommendations(response.result.recommendation));

The recommender model we’re working with here has only one input field, which is a String called “clicked_product”. The response of the API will contain the output fields of your model as defined in the previous section under the “result” field. In this case, the only input field is a list of products called “recommendation”.

And you’re all set! Now you can display recommendations for each product of your webshop using the response.

 

Conclusion

With all the steps described you should be able to now put a basic recommender system behind your own WebApp! We have walked you through association rule mining, using that to create a recommender model, deploying that model and integrating that in your WebApp. You can check our live dummy WebApp and you can play around with it yourselves. The response time of our model is <120 ms (including internet latency)!

If you enjoyed this walkthrough then create a free UbiOps account to reproduce the result yourself! And don’t hesitate to reach out to us if you have questions.