Running machine learning behind Tableau: Twitter sentiment analysis on #MLOps

By 22 July 2021Blog, Functionality

While the majority of companies is able to transform and visualise data in a user-friendly way, the demand to visualise the outcome of data science, such as machine learning models, is increasing. However, Tableau doesn’t run python, nor does PowerBI. Usually, the BI analyst does not have all the required Azure or AWS certifications to set-up a scalable and robust infrastructure there. Or simply, they don’t want to spend time on that. More often, they like to play around with Python and increase those skills.
This article is written for that person. 

Are you that person? Continue reading. 

 TL;DR: See the end result of the Twitter sentiment analysis running on UbiOps in Tableau here.


The ML Ops space and its sentiment

As a use case we monitored the sentiment of “#MLOps” on Twitter the past couple of weeks. As UbiOps is part of the MLOps stack, and because it’s an upcoming field of work that many people discuss, we thought it would be nice to use that hashtag. We connected with the Twitter API to get the daily tweets using #MLOps, run a pretrained sentiment analysis model in UbiOps and push the data to a database. In this case, it was google sheets – yes, not a proper database, but for the purpose of this article it works :). 

In production, this should be a SQL database or similar. Tableau then reads from the google sheets to visualise the daily sentiment of #MLOps. 

This article aims to guide those who wish to run python or R scripts behind their dashboards in a scalable way, and assist them by showing an example of how this can be done without knowing everything about docker, kubernetes and APIs (Application Programming Interface). 

Twitter API, UbiOps deployment package and connection to google sheets

In this part we’ll elaborate how we set up the connections and how we deployed the model with UbiOps. We won’t detail out every single line of code, but only the most important pieces. The can find the full deployment package here.

Requirements

  1. A google sheets file. 
  2. Access to the twitter API. You can apply for access here: https://developer.twitter.com/en/apply-for-access.
    Once on the developer portal, you can generate the consumer key and authentication token (and secret) – don’t forget to save it, you won’t be able to retrieve it again.  
  3. Tableau account. 
  4. Last but not least, a UbiOps account. Create a free UbiOps account here.  

Before we get started, I have to be honest, that this process felt a little bit like the gif below. Why? I’m not a pro data scientist but want to understand the ins and outs of what happens and then try it myself. I highlight both what happens (in the code and in the background) and provide you with a ready-to-go notebook that you can use yourself. Sidenote: probably every data scientist feels like the below gif. 

Now that you know that, let’s continue. 

Creating the deployment package and establishing connections

  1. Create a service user in Google.
  2. Create credentials for the service user (called keys here).
  3. Share the Google sheet with the service user account just like you would with a normal user: You hereby give it permission to edit your sheet. 
  4. Create a structured deployment in UbiOps with input: {‘day’: ‘string’, ‘hashtag’: ‘string’}. You can leave the output fields empty.
    To do so, go to the UbiOps webapp, click “deployments” on the left-hand side, “create new deployment” and fill in the info as given above. Then click next at the bottom of the page (skip the ‘labels’). 

Example of how to specify the deployment in the UbiOps webapp.

5. Now give the deployment a version name (e.g. v1) and add the deployment package (zip folder earlier provided). It’s written in python 3.8. 

Labels, request retention and monitoring emails can be skipped (unless you wish to play around with them). Do specify the ‘maximum idle time’ under ‘advanced parameters’ to 1500 seconds. This will keep your deployment running for 30 minutes after the last request. When you get to the ‘environment variables’, you should do the following: 

Create the following 6 environment variables

Google keys:
GOOGLE_CREDENTIALS: past the contents of the service user credentials/key json.
SPREADSHEET_ID: ID of your spreadsheet, which you can find in the url of the sheet.

Twitter keys (that you saved when gaining access to the Twitter API):
CONSUMER_KEY: 

CONSUMER_SECRET:
ACCESS_TOKEN:
ACCESS_TOKEN_SECRET:

Note that you should use the exact same names as written above in the environment variable, as the code in the deployment package uses these environment variables to authenticate and connect with Google/Twitter.

6. Now click ‘create’ on the bottom right and everything will be set in motion. If you’re curious to see what’s happening, just click on ‘logs’ on the left side and refresh a few times. For example, if you look in the .zip folder you uploaded, the contents of  ‘requirements.txt’ are first installed in a docker container on a Kubernetes cluster on the Google Cloud Platform.

Then, all packages/libraries as you specified in the deployment.py file are installed in the same container, and the ‘init’ function is run. As mentioned before, this is only done when a cold start of the container is required (which is the case now, because we never started it). Once you make your first request, the ‘request’ function is triggered. 

While you’re waiting: Further explanation of the deployment.py code

Start with opening the deployment package (.zip folder) and open the ‘deployment.py’ file.
You will find an ‘init’ function and a ‘request’ function. The init function is the first function that UbiOps reads. It can for example be used for loading modules that have to be kept in memory or to set up connections (like we do). You can load your external deployment files (such as pickles or .h5 files) here. 

The request function is a method for deployment requests and is called separately for each request we will be making.

Below I’ll highlight some of the most important pieces of the deployment.py file in the .zip folder. We’ll first look at the init function. 

To make the code as robust as possible, and can basically be reused by anyone without making any changes, we standardised as much as possible and allowed only a few knobs to be turned. That is the authentication for Twitter and Google sheets, and the day and hashtag used when making a request. We used environment variables to allow you to use your own Twitter developer account and Google sheet without editing the code. In the init function below, we specify the environment variables that we provide a value for in the UbiOps webapp earlier in this article. We define the spreadsheet_id and define the twitter_api function that is provided later. 

def __init__(self, base_directory, context):
        # Load in the twitter secrets and tokens from the environment variables
        self.consumer_key = os.environ['CONSUMER_KEY']
        self.consumer_secret = os.environ['CONSUMER_SECRET']
        self.access_token = os.environ['ACCESS_TOKEN']
       self.access_token_secret = os.environ['ACCESS_TOKEN_SECRET']
        # Set up the connection to twitter
        self.twitter_api = self.setup_twitter()
   # Setup the connection to Google, using the environment variable for the GOOGLE_CREDENTIALS
        # This method assumes you have an environment variable loaded with the content of the service account
        # credentials json
        self.google_sheet = pygsheets.authorize(service_account_env_var='GOOGLE_CREDENTIALS')
        # Set the spreadsheet_id from the environment variables
        self.spreadsheet_id = os.environ['SPREADSHEET_ID']
        # Set the day of today
        self.today = datetime.today()

 

For the request function, we did our best to make it as robust as possible again, like for the init function. Remember, the request function is called every time that a request is made to the model.

As you can see below, you can define the variable ‘hashtag’ as anything you like, but if you don’t provide it, it uses ‘MlOps’. The same technique is used for the variable ‘day’ (note also how you specified the input variables in the UbiOps webapp). Besides that, we defined several functions to retrieve the tweets, get the sentiment of the tweets (which is the actual machine learning model at work), and write the results to the google sheets (to be entered with the authentication key).

       

def request(self, data):
"""
Make the request by first collecting the tweets and sentiments of a day and a certain hashtag and then inserting them in a Google sheet
"""

        hashtag = data.get('hashtag', 'MlOps')  # If no hashtag is given, use MlOps
        day = data.get('day', 'yesterday')  # If no day is given, use 'yesterday'
        # Parse the user inputted day and retrieve the end date of the query ('until')
        day, until = self.parse_date(day=day)
        # Retrieve tweets from 'day' to 'until'
        texts = self.retrieve_tweets(hashtag=hashtag, day=day, until=until)
        # Determine the sentiment over the recovered tweets
        results = self.get_sentiment(texts=texts, day=day)
        # Append the values to the specified Google Sheet
        sheet = self.google_sheet.open_by_key(key=self.spreadsheet_id)
        # Open first worksheet of spreadsheet
        wk1 = sheet[0]
       # Values will be appended after the last non-filled row of the table without overwriting
        wk1.append_table(values=results, overwrite=False)

        return None

 

The other file in the .zip folder is the requirements.txt. The requirements.txt should include all requirements for your code to run. This instructs UbiOps to install all these libraries the first time (when we upload the code) so that when you make requests after some time from now everything is already installed and the request is faster. In our case, that is: 

textblob==0.15.3
tqdm==4.55.1
tweepy==3.10.0
pygsheets==2.0.5

Back to the UbiOps webapp: building has finished? Or failed? Use UbiOps logs!

Having uploaded the deployment package, we should wait for the deployment to be built. Right now, all the dependencies as noted in our requirements.txt are installed in a container, the ‘init’ function is executed and all that is deployed on a Kubernetes pod in the background. Of course, I made a mistake and the first deployment failed. 

To find out what happened, I went to the logs (top bar) and noticed there was an error somewhere in the deployment.py file in my deployment package. For me, a couple of things went wrong here. After having checked the logs, I took out some silly code mistakes and uploaded a new version that worked and is now successfully deployed.

Finishing touches

In UbiOps we’ll add a schedule so that the model is triggered every day at 8.00 CET with the data inputs “yesterday” (as ‘day’) and “MLOps” (as a hashtag). To do so, go back to the webapp and go to “Request schedules” on the left-hand side on UbiOps and click ‘create’. 

Integrate with Tableau dashboard

Tableau needs a database to read from. Having the data in the google sheets, we use the google sheets as the database to read from. Follow this guide to connect with your tableau dashboard. This should be the easy part of this guide, whereas the more difficulty is in connecting with the ‘database’. 

To collect some data and visualise how the sentiment changes over time, we’ve been running this model since the end of March 2021. The final result of the daily sentiment of MLOps can be seen below or via this webpage

Conclusion

If you are a great BI analyst and have been experimenting with Python to run more complex models behind your dashboard, then this guide is for you. Personally, I’m not an advanced data scientist (as those who are more advanced may spot), but the more reason this guide should enable you to run python scripts easily. I’ve had help from my colleagues, and hope that this guide helps you. 

In case you have any questions or suggestions, please feel free to contact me ([email protected]) or join our slack channel.