Developing a recommender model
There are different ways of developing a recommender model, but in this article we will use association rule mining. Association rule mining is a technique to identify underlying relations between different items. Take for example a supermarket where customers can buy a variety of items. Usually, there is a pattern in what customers buy. Mothers with babies buy baby products such as milk and diapers, and students might often buy beer and chips and so on. In short, transactions follow a pattern. The process of identifying these associations between products often bought together is called association rule mining.Apriori algorithm
Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm. The Apriori algorithm is based on three key concepts:- Support
- Confidence
- Lift
Implementing Apriori using Python
All fun and games how it works in theory, but let us take a look at how the Apriori algorithm can be implemented in Python for an actual use case. In this section we will use the Apriori algorithm to find rules that describe associations between different products given 7500 transactions over the course of a week at some retail store. Using these association rules, we will set up a dictionary that will contain three recommended items to also look at, per item in the dataset. The dataset can be found here and the source code here. There is a Python library called Apyori which we can use to implement the Apriori easily, without having to calculate the support, confidence and lift ourselves. You can install Apyori using `pip install apyori`. Please make sure you install Apyori before proceeding.Requirements
First, import the necessary libraries:import numpy as np import pandas as pd from apyori import apriori import pickle
Loading and preprocessing data
Now let’s load the dataset into a pandas DataFrame and use the Pandas `head()` function to see what we’re working with. I also pass the `header=None` since our dataset does not have any header.store_data = pd.read_csv('store_data.csv', header=None) store_data.head()Do not be afraid of all the NaN values we see here, it’s a result of how the dataset is constructed. Our dataset contains transaction history, with every row indicating a new transaction. Transactions are not consistent in size. Some people buy 4 items, others 20 and therefore our rows differ in length. Our DataFrame corrects for that by filling empty spots with NaNs and taking as row size the length of the biggest transaction. The Apriori library we are using requires our data to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is a sublist. This looks something like `big_list = [[transaction1_list], [transaction2_list],..]`. To transform our pandas DataFrame into a list of lists we use the following code snippet:
df_shape = store_data.shape n_of_transactions = df_shape[0] n_of_products = df_shape[1] # Converting our dataframe into a list of lists for Apriori algorithm records = [] for i in range(0, n_of_transactions): records.append([]) for j in range(0, n_of_products): if (str(store_data.values[i,j]) != 'nan'): records[i].append(str(store_data.values[i,j])) else : Continue
Applying the Apriori algorithm
Our data is in the correct format to be fed to the apriori class of the Apyori library. The apriori class requires some additional parameters to work.- records: our data in list of lists format
- min_support: minimum support values required for every association rule
- min_confidence: minimum confidence value required for every association rule
- min_lift: minimum lift value required for every association rule
- max_length: maximum number of items you want per rule
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=2, max_length=5) association_results = list(association_rules)Apriori will find 142 association rules that fall within our desired ranges. We can take a look at what rules were found by Apriori. for item in association_results:
# First index of the inner list contains base item and add item pair = item[0] items = [x for x in pair] to_print = "Rule: " arrow = " -> " for i in range(len(items)): to_print += str(items[i]) + arrow print(to_print) # Print the support for this association rule print("Support: " + str(item[1])) # Print the confidence and lift for this association rule print("Confidence: " + str(item[2][0][2])) print("Lift: " + str(item[2][0][3])) print("=====================================")Your output should look something like this: Now that we have our association rules we need to use that to build up “recommendation rules”. What we want is in essence a lookup table, in which we can look up a certain product, and find three associated products that a buyer might also be interested in. However, not every association rule gives us three items that are frequently bought with the base item, sometimes only two are returned. To make sure that for every product we can return three recommendations, we will recommend the overall most frequently bought products to fill up the gaps. To do so, we will first have to rank all the products based on how frequently they appear in purchases in our dataset.
# Get all the products listed in dataset # First merge all the columns of the data frame to a data series object merged = store_data[0] for i in range(1,n_of_products): merged = merged.append(store_data[i]) # Then rank all the unique products ranking = merged.value_counts(ascending=False) # Extract the products in order without their respective count ranked_products = list(ranking.index)Now that we have a ranking of the products, and the association rules found by Apriori, we can set up our recommendation rules. We can modify the printing loop we made earlier for this purpose.
lookup_table = {} for item in association_results:
# First index of the inner list contains base item and add item pair = item[0] items = [x for x in pair] to_print = "Rule: " arrow = " -> " for i in range(len(items)): to_print += str(items[i]) + arrow # If we do not have 3 recommendations for our base product we will # suggest top ranked products in addition if len(items) < 4: items_to_append = items i = 0 while len(items) < 4: if ranked_products[i] not in items: items_to_append.append(ranked_products[i]) i += 1 # Add the items to db, with base product separately from the products # that are to be recommended Lookup_table[items_to_append[0]] = items_to_append[1:] print(to_print) # Print the support for this association rule print("Support: " + str(item[1])) # Print the confidence and lift for this association rule print("Confidence: " + str(item[2][0][2])) print("Lift: " + str(item[2][0][3])) print("=====================================")In the code we check for every association rule if it is already of length 4 (1 base item plus 3 recommendations) or not. If not, we append items from our ranked product list. Finally, we add the recommendation rules to a dictionary `lookup_table`. Unfortunately our lookup_table does not contain recommendations for all products, since not an association rule was found for every product. In case we don’t have a recommendation, the top 3 most frequently bought items need to be suggested. Therefore we need an additional entry in our table:
lookup_table['default_recommendation'] = ranked_products[:3]
Deploying the recommender model to UbiOps
We now have a lookup table that returns three recommendations for every item in the store, but the recommendations still need to be put behind our webshop. To do so, we first need to deploy and serve our model. In other words: we need to bring it live. In this article we use UbiOps for that. We start by importing the necessary libraries:import shutil import os import ubiops import pickle
Establish a connection with UbiOps
To establish a connection with UbiOps you need a UbiOps account (a free one will do) and an API token with project-editor rights, which can be generated via the WebApp. To set up the connection we use the following lines of code:API_TOKEN = '<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>' # Make sure this is in the format "Token token-code" PROJECT_NAME= '<INSERT PROJECT NAME IN YOUR ACCOUNT>' DEPLOYMENT_NAME='recommender-model' DEPLOYMENT_VERSION='v1' client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}) api = ubiops.CoreApi(client)
Create the recommender model
Our recommender model is relatively straightforward, it takes a “clicked_product” as input, and returns a list of 3 products as output. The clicked product refers to when a person clicks on a product in our webshop. To achieve this functionality in a deployment in UbiOps we need the following deployment.py: As you can see, when the deployment is initialized it loads our lookup_table, and when a request is made, it looks for recommendations. We then put this deployment.py in a deployment package together with our lookup_table and a requirements.txt describing our dependencies. I pickled the lookup_table to easily add it to a deployment package. You can find the complete deployment package here.Exposing the model via an API endpoint
To generate an API endpoint for the model we can deploy it to UbiOps. You can do so via your notebook using:# Set up deployment template deployment_template = ubiops.DeploymentCreate( name=DEPLOYMENT_NAME, description='Recommends other products to look at based on clicked product', input_type='structured', output_type='structured', input_fields=[ ubiops.DeploymentInputFieldCreate( name='clicked_product', data_type='string', ) ], output_fields=[ ubiops.DeploymentOutputFieldCreate( name='recommendation', data_type='array_string' ) ], labels={'demo': 'recommender-system'} ) api.deployments_create( project_name=PROJECT_NAME, data=deployment_template ) # Create the version version_template = ubiops.VersionCreate( version=DEPLOYMENT_VERSION, language='python3.8', ) api.versions_create( project_name=PROJECT_NAME, deployment_name=DEPLOYMENT_NAME, data=version_template ) # Zip the deployment package shutil.make_archive('recommender_deployment_package', 'zip', '.', 'recommender_deployment_package') # Upload the zipped deployment package file_upload_result =api.revisions_file_upload( project_name=PROJECT_NAME, deployment_name=DEPLOYMENT_NAME, version=DEPLOYMENT_VERSION, file='recommender_deployment_package.zip' )Make sure that you have a copy of the deployment package in your working directory to make this code work. Once UbiOps has finished building the deployment, its API endpoint will be available at https://api.ubiops.com/v2.1/projects/<your-project-name>/deployments/recommender-model/versions/v1/request
Putting the recommender model behind a WebApp
Now that your API endpoint is ready to use, you can integrate your recommender model into your own WebApp. To help you with that, we created a WebApp ourselves as an example. You can find it in this GitHub repository. And the live application is running here. The recommendation model is called to recommend other products to the user in the product details page. We used ReactJS but the piece of code that calls the UbiOps API can be used with any JavaScript library or framework. Here are the steps to follow:1. First, you have to define all the parameters you will need to call the API endpoint:
const API_TOKEN = process.env.REACT_APP_API_TOKEN;
const API_URL = process.env.REACT_APP_API_URL;
const PROJECT_NAME = process.env.REACT_APP_PROJECT_NAME;
const DEPLOYMENT_NAME = process.env.REACT_APP_DEPLOYMENT_NAME;
const DEPLOYMENT_VERSION = process.env.REACT_APP_DEPLOYMENT_VERSION;
The project name, deployment name and version name are as defined in the previous section. As for the API token, you can create it from our WebApp. In our case, the API token should only be allowed to create requests in order to prevent any misuse. To do that, you can create a custom role in the “Roles” tab of the “Permissions” section and call it “deployment-request”. Select “deployments.versions.requests.create” as only permission.
Once your role has been created, you can create and assign it to an API token. Create a new token from the same “Permissions section”, and under allowed domains set the domain of your webapp (e.g. subdomain.example.com or localhost:3000). Next, assign the “deployment-request” role to the token. Copy the token and set it as the REACT_APP_API_TOKEN
. For more information about API tokens and Roles, check out our documentation on service users.
2. Then define an asynchronous function to fetch your data from the UbiOps API.
First, pass it the endpoint you want to call, which is the request endpoint in our case (see below). Then, pass the JSON data sent along with the POST request as the second argument. We’ll call this function “postRequest”:async function postRequest(url = "", data = {}) { const response = await fetch(API_URL + url, { method: "POST", headers: { "Content-Type": "application/json", Authorization: API_TOKEN, }, body: JSON.stringify(data), }); return response.json(); }
3. And finally, call your API endpoint:
postRequest( `/projects/${PROJECT_NAME}/deployments/${DEPLOYMENT_NAME}/versions/${DEPLOYMENT_VERSION}/request?timeout=3600`, { clicked_product: product } ).then((response) => setRecommendations(response.result.recommendation));The recommender model we’re working with here has only one input field, which is a String called “clicked_product”. The response of the API will contain the output fields of your model as defined in the previous section under the “result” field. In this case, the only input field is a list of products called “recommendation”. And you’re all set! Now you can display recommendations for each product of your webshop using the response.