Combining R and Python in the same pipeline: the prediction of house prices

By 3 August 2021Blog, Functionality

Combining R and Python in the same pipeline

While hundreds of programming languages exist, Python and R remain the most popular ones to use in the world of data science. R is a great language to make visualizations and graphs, furthermore, it has many functionalities for data analysis. Python is a general-purpose language that is very readable, quick and great for mathematical computation. Python is also regarded as the best language for machine learning, thanks to packages like Scikit-learn.

When working in teams, one may encounter that different team members prefer Python or R but want to collaborate on developing and deploying a data pipeline.

There are packages like rpy2 that make it possible to use R packages within Python, or rPython which lets you use Python packages within R. But these packages do not really offer a solution for the problem described above, because they only allow the user to import packages from the other language. So you still need to have combined knowledge of R and Python. Ideally, you keep programming in R and your colleague can continue in Python.

In UbiOps it is possible to make a pipeline that consists of modular scripts called deployments, where one deployment can be written in R while the other deployment(s) can be written in Python. This functionality could be very useful in situations like described above. Another benefit is that UbiOps enables you to immediately make requests to that pipeline via a single web service endpoint. This makes it much easier to integrate this pipeline into web applications or dashboards than using libraries like rpy2 or rPython, because they do not create an endpoint.

UbiOps also saves the hassle of setting up servers, deploying our application, configuring networking, user management, scalability and uptime. If you want to know more about the functionalities of UbiOps, you can check out the documentation page.

This article will show you how to create deployments and how to connect these deployments in order to create a pipeline on UbiOps.

Use case

Because of the increase in house prices recently, I wanted to make a model that predicts house prices using this publicly available dataset. A friend of mine already did the data exploration and data preparation in R but I want to make the prediction model in Python. I could use libraries like rPython or rpy2, but it would be easier for me to use UbiOps since I am not familiar with writing code in R.

Notebook walkthrough and instructions

Pipeline overview

Before we deep dive into how to deploy R and Python scripts in one pipeline, let’s look at the overall architecture. It’s built up from two separate ‘deployments’. Deployments are objects within UbiOps that serve a user’s code. From the uploaded code, the platform will build a container, running as a microservice inside UbiOps, that can receive requests to transform input data into output data. Examples of typical deployments are algorithms, data aggregation scripts and trained machine learning models.

The Python deployment is based on the XGboost-recipe from the UbiOps cookbook. Please check out that notebook if you want to see how the Python deployment can be made in more detail.

Deployment Function Language
r-eda-deployment Making graphs to gain insight in the dataset and prepare the data for the prediction R
python-pred-deployment Make a prediction of the house prices Python

Table 1: Pipeline overview

The visual representation of this pipeline in the UbiOps UI looks like this:

Fig 1: the end result: R+Python pipeline in the UbiOps webapp.

Whereby we specified the following input and output data fields:

Deployment Input Output
r-eda-deployment raw_data: Blob (file) clean_data: Blob (file)
python-pred-deployment clean_data: Blob (file) Prediction: Blob (file)

Table 2: Pipeline input and output overview

Below I’ll explain (1) how to create the R deployment, (2) how to add both (R and Python) deployments as objects to a pipeline and (3) how to connect both objects. If you want to see the full  Python notebook and R script, please check out the following link.

The first step of making a Python-R pipeline is to create the deployments. The R deployment is run from Rstudio and the Python code from Jupyter notebook.

Build the R-deployment.


deployment <- list(

name = DEPLOYMENT_NAME,

description = "r-eda-deployment.",

  input_type = "structured",

  output_type = "structured",

  input_fields = list(

    list(name = "raw_data", data_type = "blob")

  ),

output_fields = list(

   list(name = "clean_data", data_type = "blob")

),

labels = list(demo = "r-eda-deployment")

)

result <- deployments_create(data = deployment)

result

Add the deployments as objects to the pythonr-pipeline.



R-deployment object:

object <- list(

name = DEPLOYMENT_NAME,

reference_name = DEPLOYMENT_NAME,

version = DEPLOYMENT_VERSION

)

result1 <- pipeline_version_objects_create(

pipeline.name = PIPELINE_NAME,

version = PIPELINE_VERSION,

data = object)

result1

Add the python deployment as an object to the pipeline.

Now that both the R and Python deployments are done and the R object is made, it is time to add the python-deployment as an object to the pipeline.

object_template = ubiops.PipelineVersionObjectCreate(

   name= DEPLOYMENT_NAME,

    reference_name="python-pred-deployment",

   version="v1"

)

result = api.pipeline_version_objects_create(

    project_name=PROJECT_NAME, pipeline_name=PIPELINE_NAME, version=PIPELINE_VERSION, data=object_template

)

print(result)

Connect the objects.

Start -> r-eda-deployment

attachment_template = ubiops.AttachmentsCreate(

   destination_name="r-eda-deployment",

    sources=[

       ubiops.AttachmentSourcesCreate(

           source_name="pipeline_start",

           mapping=[

                ubiops.AttachmentFieldsCreate(

                  source_field_name="input",

                   destination_field_name="raw_data")]        )
   ])

api.pipeline_version_object_attachments_create(

   project_name=PROJECT_NAME,

   pipeline_name=PIPELINE_NAME,

   version=PIPELINE_VERSION,

   data=attachment_template

)

r-eda-deployment -> python-pred-deployment

attachment_template = ubiops.AttachmentsCreate(

   destination_name="python-pred-deployment",

   sources=[

        ubiops.AttachmentSourcesCreate(

           source_name="r-eda-deployment",

            mapping=[ubiops.AttachmentFieldsCreate(

                source_field_name="clean_data",

               destination_field_name="clean_data")]
       )
   ])

api.pipeline_version_object_attachments_create(

   project_name=PROJECT_NAME,

   pipeline_name= PIPELINE_NAME,

    version=PIPELINE_VERSION,

   data=attachment_template

)

python-pred-deployment -> pipeline end

attachment_template = ubiops.AttachmentsCreate(

    destination_name="pipeline_end",

    sources=[

        ubiops.AttachmentSourcesCreate(

            source_name="python-pred-deployment",

            mapping=[ubiops.AttachmentFieldsCreate(

                source_field_name="prediction",

               destination_field_name="prediction")]        )
   ])

api.pipeline_version_object_attachments_create(

    project_name=PROJECT_NAME,

    pipeline_name="pythonr-pipeline",

    version=PIPELINE_VERSION,

   data=attachment_template
)

After everything is finished building, a request can be made to the pipeline.


Fig 2: Request result

Looks like it is all working perfectly!

 

Wrap up

Now that you have seen how easy it can be to combine R and Python code with UbiOps, we hope that this helps you in your day-to-day projects. For any questions or suggestions please join the UbiOps community slack channel or contact our customer support.

Download the Jupyter notebook and Rscript  here