In this article, we will analyze some of the drawbacks of Databricks when it comes to machine learning – specifically when it comes to the deployment stage of a model. Following this, we’ll explain why using UbiOps with Databricks makes sense, and follow up with some of the UbiOps platform’s core strengths.

What is Databricks?

Databricks is a data management platform. It’s built on top of Apache Spark, which is a big data engine designed for large-scale data processing. It adds an accessibility layer around Spark, allowing you to process data quickly without needing as much technical knowledge as you would need to use Spark directly.

It uses the data lakehouse architecture to store data. What is a data lakehouse? It’s a combination of data lakes — large repositories of raw unstructured data — and data warehouses — large repositories of structured tabular data. Essentially, it maintains the structural data management benefits of a data warehouse and the unstructured data management system of a data lake. It’s a catch-all data management architecture which works great for large combinations of structured, unstructured, and semi-structured data.

What is Azure Databricks?

Azure Databricks is Microsoft Azure integrated with Databricks. Databricks uses a cloud provider for storage, such as Azure, AWS, or Google Cloud. Regular Databricks is deployed on AWS, and lacks access to the Azure ecosystem, with its integrations, security, and monitoring options. If you’re already familiar with Azure and want the benefits of a lakehouse architecture, Azure Databricks is a great integration.

What is UbiOps?

UbiOps is a powerful AI model serving and orchestration service with unmatched simplicity, speed and scale. UbiOps minimizes DevOps time and costs to run, train and manage AI workloads, and distributes them on any compute infrastructure at scale. It’s built for training, deploying, running and managing production-grade AI in an agile way. It features unique functionality for workflow orchestration (Pipelines), automatic adaptive scaling in hybrid or multi-cloud environments, as well as key MLOps features. You can learn more about UbiOps features on our Product page. For the record, we have dedicated guides on how to deploy Mistral-7b, BERT, LLaMa 2 and Falcon using UbiOps.

How UbiOps can complement Azure Databricks

Azure Databricks is excellent when it comes to data management and preparation, but it’s limited when it comes to deploying and serving production-grade workloads. Integrating with UbiOps allows you to overcome those limitations and easily deploy models trained on Azure Databricks.

Azure Databricks can be used to craft training data as it’s primarily a data engineering platform. It’s really useful for training models using libraries like TensorFlow and Pytorch. UbiOps, on the other hand, is especially effective when it comes to model deployment. Our platform allows you to take AI models to production and call them via REST API, without having to worry about scalability or needing extensive DevOps knowledge and MLOps engineers. Furthermore, UbiOps can easily be integrated with Databricks.

It’s important to mention that Databricks has its own model deployment functionality: Model Serving. In this article, we will discuss why UbiOps might be a better option than Databricks’ Model Serving when it comes to ease of use, hardware, and scaling capabilities.

Some drawbacks of using Databricks for MLOps

Databricks has limited GPU usage

Azure Databricks has some limits to GPU usage. Some of its GPU instance types are in beta, meaning that they lack a stable interface, have low maturity, and aren’t meant for use in production.

In addition, the autoscaling features are more limited when using GPUs for model serving. With Azure Databricks Model Serving, your deployed model endpoint doesn’t allow you to scale down to 0 automatically when using GPU hardware. Autoscaling takes longer when using GPUs, as well.

Databricks has limited support for computer vision

While Databricks does support computer vision (CV), it has some limitations. As mentioned, Databricks is built on top of Apache Spark, which, itself, is built for structured data. Many CV libraries have limited support for Spark dataframes. Therefore, for large CV workloads, a workaround is needed.

With Azure Databricks Model Serving, when using GPUs, image creation takes longer than when using CPU hardware instances. While this is the case with UbiOps as well, we’re optimized for CV tasks. For example, Model Serving has a 16 MB payload limit per endpoint request, which could hamper some CV use cases. UbiOps can handle up to 64MiB per request.

Databricks clusters can have slow startup times if not used efficiently

Databricks is best suited for batch inference and big data jobs. In other words, it’s made for large-scale data usage. With Azure Databricks Model Serving, endpoints with GPUs do not support scale-to-zero autoscaling, and therefore might require a cold start if you don’t want them running all the time. This will come with some latency issues. There are many cases where Databricks clusters have had high startup times, often taking several minutes. This isn’t great when you expect low traffic, as you will need to pay for that startup time, meaning you’re paying for when the cluster is unusable.

Databricks is sometimes not the most cost-effective solution

Databricks is best suited for very large companies and becomes more cost effective the greater the size of the company. But many companies “don’t have data sizes at scale to really justify using Databricks.” As mentioned above, GPU-configured endpoints do not support scale to zero. So, for low-frequency use cases, you might be paying for the endpoint even though it’s not being used.

It’s important to mention that, when taking a look specifically at Databricks Model Serving pricing, some models, deemed “foundational” (Llama 2 for instance), use a pay-per-token pricing plan. In these cases, some of the issues we mention above do not apply. However, for most other models, the problems above do apply and will incur additional costs to fix — unless you integrate with UbiOps, that is.

The benefits of UbiOps for AI deployment

UbiOps works well for low-latency use cases

With UbiOps, you can start up your deployments in seconds. UbiOps GPU clusters take around 30 seconds to scale up from zero to one. This is considerably faster than the often-reported minutes it takes for Databricks clusters. In addition, UbiOps’ usage limits are geared towards companies with smaller data needs. If we compare the Ubiops platform with standard Azure Databricks Model Serving, Model Serving base limits are much higher. However keep in mind that many of these limits can be easily changed to match your use case with UbiOps.

UbiOps has seamless integration with GPU clusters

UbiOps allows you to use GPU deployments and their integration is fully functional. We offer GPU-accelerated hardware with GPU clusters such as NVIDIA A100 and T4 GPUs, which are specifically designed to handle machine learning inferences. While Azure Databricks offers similar hardware, the automatic management which UbiOps provides, including autoscaling, makes it potentially an easier solution for running models on GPU clusters.

UbiOps has autoscaling

UbiOps offers automatic scaling to optimize your computing usage. UbiOps offers both scheduled scaling and dynamic scaling. Autoscaling solves both monetary issues, making it so you don’t spend money on an instance when you aren’t using it, and time issues, making it so you don’t have to worry about the arduous task of manual scaling. Furthermore, UbiOps has scale-to-zero functionality and is well suited for low-frequency use cases.

UbiOps is optimized for computer vision

UbiOps is well suited for CV. We also perform well with image generation models such as Stable Diffusion. UbiOps has partnered with several companies specifically for CV workloads, particularly in the crop science and healthcare fields. UbiOps is a proven and capable solution for CV.

UbiOps has advanced deployment health monitoring

UbiOps has extensive monitoring capabilities, so you can easily keep track of your deployments. Event auditing, logging, performance metrics, notifications, and more, all readily available in our UI. When it comes to model deployment, the UbiOps platform is more well rounded and easier to use than most solutions out there — but you don’t have to take our word for it.

Conclusion

To summarize, Azure Databricks is great for model training and development, but there are some drawbacks when it comes to model deployment. While Databricks’ Model Serving is very useful, its ease of use and hardware capabilities have room for improvement.

UbiOps, on the other hand, is designed and built as a deployment platform for data scientists, requiring no engineering experience.. And UbiOps can be easily integrated with Azure Databricks using UbiOps’ REST API. Consider integrating UbiOps into your Azure Databricks stack now!

By industry

By application

On-demand GPU

Featured customers

NEW! Webinar with ReefSupport!

Latest news

UbiOps vs standard Model Serving Platforms

New UbiOps features July 2024

How do Azure, Databricks, and UbiOps fit together?