Skip to content

Load your libraries from Git

You may want to import Git public or private libraries / files with e.g. custom scripts into your deployment package. Why is this better than normal dependencies? There are various advantages such as having access to development versions, specific forks or alternative repositories, it enables custom modifications or patches to be applied on the source code, and it allows to test pre-releases or experimental branches. You will also see how to create a service account for your Git service provider.

This guide will cover three steps:

  1. Importing a Git repo inside your environment
  2. Importing a Git repo inside your deployment or training code
  3. Creating a Git service account / access token

Importing libraries from Git

Adding a Git repository to your environment package

You can import / clone a git repo inside our environment either via HTTPS or via SSH. For private repositories, the authentication is typically done using your username and password or a personal access token. HTTPS is commonly used when accessing public repositories or for cases where you don't have SSH access to the repository server. When cloning a repository using SSH, the data transfer is also encrypted, but the authentication is done using SSH keys. You need to generate an SSH key pair, add the public key to your Git provider (e.g. GitHub, GitLab), and configure your local machine to use the corresponding private key.

Setting up the requirements.txt

Inside your deployment package, you have a requirements.txt that needs to contain your dependency.

git+<git_url>@<commit_hash_or_branch_name>

Let's take a simple example repository such as NumPy.\ Replace <git_url> with the copied Git URL (for SSH you need to add ssh:// in front) and <commit_hash_or_branch_name> with the specific commit hash or branch name you want to use. For example, we want to clone by HTTPS and get the main branch, so we end up with:

git+https://github.com/numpy/numpy.git@main

Then you can use import numpy as np in your deployment code and start using the library.

Setting up the ubiops.yaml

Adding a git repo to the requirements.txt requires the apt package git to be installed. Therefore, you need to add a ubiops.yaml file to the environment package with at least the following content:

apt:
  packages:
    - git

Adding a Git repository to your deployment or training code

You can also import your libraries directly through your deployment code. This will clone your repository into your deployment package. This way, the latest version of your Git repository will always be used inside your deployment or training code. This can increase your development cycle, especially when training models.

Setting up the training code

Here is an example of how to include Git dependencies in your train.py:

import os
import shutil
from git import Repo

# imports for train_func()

if os.path.exists("git_repo"):
    shutil.rmtree('git_repo')

try:
    Repo.clone_from("<git_url>", "git_repo")
    print('Cloning successful!')
except exc.GitCommandError as e:
    print('Cloning failed:', str(e))
from git_repo.training_script import train_func


def train(training_data, parameters, context={}):
    train_func()

Replace <git_url> as mentioned above. Note that you can add the branch=<branch_name> parameter to Repo.clone_from to select a specific branch to clone.\ The if statement ensures that any previously cloned repository is cleaned from the instance. This ensures that the latest version of the repository is always used inside your code. It is possible that you will be asked for an access token when trying to import the repository, also known as a Git service account. This will be discussed in the second part of the how-to.

Setting up the requirements.txt

For this approach to work, an Ubiops environment is required with the dependency shown below in requirements.txt.

gitpython

Keep in mind that you will need to add the necessary dependencies for your repository to requirements.txt and import them into train.py.

Setting up the ubiops.yaml

Just like previously, you need to create add a ubiops.yaml file to your environment.

apt:
  packages:
    - git

Creating a Git service account / access token

In case of private repositories, you may need to authorize yourself using a Git access token. We will first explain how to create one, and then how to use these access tokens to authorize yourself.

GitHub access token

  1. Log into your GitHub account
  2. Click on your profile picture in the top-right corner of the page and select "Settings" from the dropdown menu.
  3. In the left sidebar, click on "Developer settings" (should be the lowest).
  4. In the Developer settings menu, click on "Personal access tokens".
  5. Select "Tokens (classic)".
  6. Click the "Generate new token" button.
  7. You will be prompted for your password, enter it in the field.
  8. Provide a descriptive note for the token to help you remember its purpose and an expiration period.
  9. Select the desired scopes or permissions for the token. Scopes define the access privileges of the token. Choose the scopes based on the tasks or actions you want to perform with the token. For example, if you only need read access to repositories, you can select the "repo" scope.
  10. Click the "Generate token" button at the bottom of the page.
  11. GitHub will generate an access token for you. Make sure to copy and save the token in a secure place, as it will be displayed only once. Once you navigate away from the page, you won't be able to access the token again. Treat it like a password.

This is how the clone_from() command will look like in train.py:

Repo.clone_from("https://<access-token>@github.com/<username>/<repository>.git", "git_repo")

Replace <access-token> with the actual access token you generated from GitHub. Make sure to include the entire URL, including the https:// prefix.\ For example, if your username is johnsmith, the repository is myrepo, and the access token is abc123, the clone URL would look like this:

https://[email protected]/johnsmith/myrepo.git

You can use the generated access token in your Git operations, API requests, or any other GitHub-related interactions that require authentication. Make sure to keep your access token secure and do not share it publicly or expose it in your code repositories.

GitLab access token

  1. Log into your GitLab account.
  2. Click on your profile picture in the top-right corner of the page and select "Settings" from the dropdown menu.
  3. In the left sidebar, click on "Access Tokens".
  4. Provide a name for the token in the "Token name" field to help you remember its purpose and an expiration date.
  5. Select the desired scopes or permissions for the token. Scopes define the access privileges of the token.
  6. Click the "Create personal access token" button at the bottom of the page.
  7. GitLab will generate an access token for you. Make sure to copy and save the token in a secure place, as it will be displayed only once. Once you navigate away from the page, you won't be able to access the token again. Treat it like a password.

This is how the clone_from() command will look like in train.py:

Repo.clone_from("https://oauth2:<access-token>@gitlab.com/<username>/<repository>.git")

Note that if you're using a self-hosted or on-premises GitHub or GitLab instance, the interface may be different. For example, the "Access Tokens" menu may be under "Edit Profile" for Gitlab instances. Moreover, the host part of the clone URL will change as well, from github.com / gitlab.com to your custom host name.\

You can use the generated access token in your Git operations, API requests, or any other GitHub-related interactions that require authentication. Make sure to keep your access token secure and do not share it publicly or expose it in your code repositories.

Conclusion

Now you will be able to import your own libraries from Git into your deployment, either through importing them from requirements.txt or by cloning your repository locally through your training code, and now you know how to generate access tokens for both GitHub and GitLab for enabling Git-related actions that typically require authentication. We hope that this article was helpful for you!