Has your organization been deploying a variety of Machine Learning (ML) or cloud-native applications? Then most likely you have been using Kubernetes or similar orchestration tools and services such as AWS Fargate or Amazon Elastic Container Service (Amazon ECS). The advantage of Kubernetes in terms of container management, deployment automation, and scaling is well established especially since it owns 92% of the market share of all orchestration tools. However, the common challenge that organizations run into with Kubernetes is the amount of effort it takes to maintain and manage Kubernetes along with the compute overhead it brings.
As more products and services become ML-heavy, managing the application deployment lifecycle requires additional effort than the usual DevOps team is skilled at. This is because machine learning products and services require much more frequent updates, changes and monitoring to stay reliable and relevant. These challenges make efficient management of the underlying infrastructure crucial.
Complexity increases even further if your business model involves running cloud applications on your customer’s infrastructure. In this situation, you become dependent on the customer providing you with a managed Kubernetes cluster or allowing you access to set one up. This entails costs on both sides and consequently creates an issue of when, where, and how you can deploy your AI application.
Problem
Let’s put it in concrete terms. Is your organization prepared to run AI/ML applications efficiently and compete with your industry peers? AI/ML applications have a lifecycle of their own (read more on MLOps here) and managing them requires dedicated AI infrastructure and expertise. If you are running computer vision or Natural Language Processing (NLP) applications, these tend to consume a lot of computing resources, especially GPUs. Often, the costs of running these applications become very high. Thus, organizations need to plan and optimize their AI infrastructure stack.
Specifically, the problem is twofold. Firstly, organizations need additional skilled resources and time, such as Kubernetes or DevOps experts, to install and manage Kubernetes-based clusters and node pools, and these skilled resources can easily cost EUR 100-250k USD annually (1 FTE). Secondly, organizations relying on specific cloud vendors have to pay even more for additional resources (e.g. GPUs) when they run AI applications in production even though there might be a cheaper alternative, whether that be building on-prem or using a different cloud provider. How can we solve this problem of making deployment management easy while lowering the Total cost of Ownership (TCO) for running AI applications?
Solution
Enter the UbiOps Compute Platform. This feature enables UbiOps customers to deploy on Virtual Machines inside their own tenant, without needing to install Kubernetes, but with all the benefits of scalability and reliability. It essentially makes it possible to deploy on any infrastructure: from VMs at hyperscalers, to various Infrastructure as a Service (IaaS) providers, and even your local server, without much engineering work required. If you are deploying on any of the hyperscalers (Google, AWS, Microsoft), with UbiOps you can choose to deploy on 100+ different regions from these providers instead of a couple of available managed Kubernetes node pools. Further, it enables better compliance with data processing and locality requirements for the customers.
How does it work?
UbiOps is built to run and scale models in production. The platform consists of two primary components: MLOps capabilities and Control plane layer. Our MLOps capabilities ensure models can be managed efficiently in production environments. The Control Plane layer enables to connect and manage various types of infrastructure.
To deploy in your local computing environments with UbiOps, you can either create a Virtual Machine and provide us with access to the machine (SSH), or, in case you are using a virtualization layer, such as the compute engines inside the hyperscaler clouds, you can give us access to create the VM in your cloud environment. After the Virtual Machine is created with the required OS image, we install the deployment agent, connect it to the core UbiOps platform and then run the deployment code that a user has created on the UbiOps platform. This setup doesn’t require any work other than VM creation – which is fairly simple. Furthermore, the compute platform can scale deployments within your tenant depending on your quota and available resources.
How does the UbiOps compute platform benefit you?
Facilitate a hybrid-cloud strategy and save weeks of work!
Lately, many organizations are looking to implement a hybrid cloud strategy where AI workloads are divided between local and cloud infrastructure. While cost, data privacy and other organizational concerns are driving this need, the transition is not easy. Moving workloads to a hybrid infrastructure means your team is now responsible for managing local and cloud infrastructure at the same time. Specifically, the DevOps or platform team needs to set-up and manage an orchestration layer (typically with Kubernetes) on the local and the cloud infra to enable container-based deployments. Sadly, Kubernetes has substantial time management and compute resource overhead, especially for smaller size VMs. The team also needs to separately manage the workflows unique to each infra environment.
UbiOps helps in two ways
First, it provides a unified control plane to manage cloud and local resources from the same control plane. This makes lifting and shifting workloads across on-prem or cloud environments just a matter of a few clicks (or a few lines of code). Second, directly deploying on the VMs without Kubernetes installation means less engineering overhead, better resource utilization and faster time to value for data science teams.
We found that in our last few installations, the customers with in-depth Kubernetes expertise needed 3-5 days to install, test, and manage reliable deployments. Others took weeks to just prepare the infra. Deployment on VM, on the other hand, required 30 minutes of work – and led to better performance metrics, such as a significant latency reduction. This is a more than 98.5% drop in the amount of infrastructure work required to deploy AI.
More deployment choices
AI workloads can differ in terms of the infrastructure required to compute, store and run models. Being able to directly run on VMs means more available node type options for your team to use. UbiOps VM deployment agent solves this problem with more choices on deployment machine types (e.g. CPU and GPU types, memory sizes, etc).
Shorter cold start times
As a team, you want requests to your models to be handled as quickly as possible with minimal latency. Not only because this allows you to act quicker, but also to reduce the costs of running your models. Models require more and more advanced hardware these days and this makes running them all the more expensive. Optimizing the cold start is an important performance and cost factor.If you are hearing about the cold start problem for the first time, allow me to explain. Cold start refers to the issue of applications taking longer to start when they haven’t been used recently. In the context of AI/ML applications, this is the time it takes for your model to start up before it’s processed. This typically includes the time of spinning up the necessary hardware and loading the model.
Teams tackle cold-start through letting a model instance run continuously. This is part of the scaling setting. Ideally though, you would like to run models only when there is an active user request coming in. However, when a request is made via an application, in the background the API end-point will check if a deployment instance is running. If not, the platform will create an instance, install dependencies (such as CUDA for a GPU-based model), download the model, and make it live, and only then can it process the user request. This process can take from a few seconds to a few minutes, depending on the type of instance. Eventually, a user might get tired of waiting and quit the application, resulting in an opportunity loss for the business. Thus, you need to ensure performance without too much costs.
Because the UbiOps compute platform does not rely on Kubernetes, the start up times are much faster than Kubernetes-based deployments. So, you can scale from zero to one much faster and without having to let the model instance run continuously. We saw around a 20% improvement in our benchmarks.
Conclusion
To conclude, the UbiOps compute platform functionality simplifies running Machine Learning Applications and reduces infrastructure overhead for your engineering team. It directly saves time and cost for running AI/ML applications and makes your product or service more competitive by running on a variety of digital infrastructures with ease.
If you are building an AI product or service that is rapidly evolving and needs to scale, UbiOps can be the perfect choice for running your inference workloads in a cost and time-efficient way.
Book a demo to learn more about UbiOps and discuss your projects!