What is model serving?

Model deployment or model serving designates the stage in which a trained model is brought to production and readily usable. A model-serving platform allows you to easily deploy and monitor your models hassle-free. Below is the MLOps dev cycle and how UbiOps can be used within that cycle.

How UbiOps fits into the MLOps dev cycle

How UbiOps fits into the MLOps dev cycle

At the model-deployment stage, challenges such as monitoring, debugging, hardware usage and scaling become very relevant. This is where UbiOps comes in. 

What is UbiOps? 

UbiOps is a platform which allows you to outsource solving the arduous task of model deployment and management. Issues such as scaling problems, high cloud costs, GPU unavailability, debugging code, managing complex pipelines, building and maintaining extensive APIs, managing many model versions, and security issues are solved if you use UbiOps. UbiOps is only partly an AI deployment platform, it is also an AI management tool—meaning you can use UbiOps with on-premise setups. 

In this article, we will discuss 3 of the issues in the model-deployment stage and the drawbacks and benefits of using a model-serving platform

3 issues encountered during model-deployment 

As mentioned, the task of model deployment is not an easy one. We will now detail some of the problems associated with this stage in the MLOps cycle. 

Scaling and setup costs

Scaling designates the process of increasing or decreasing hardware usage based on the amount of traffic your deployed models are getting. Large models, especially state-of-the-art LLMs such as Mixtral which have over 40 billion parameters, are very hard to run without serious hardware infrastructure. An article in Scientific American details how AI giants such as Microsoft and Google are now transitioning towards smaller models which can be run on most consumer-grade hardware. Gemma is a good example of this.

What is Gemma? Gemma is a model series released by Google in February 2023. It’s called a small language model (SLM) and has around 2 billion parameters with the purpose to be run on smartphones and laptops. Overall, models are extremely hardware intensive. Gemma is a model released to deal with this issue. We have a guide on how to deploy Gemma-2b for free using UbiOps.

This brings us to scaling, the more you expect your model to be called, the more hardware costs you will incur. An article on Akkio gives a good overview of the costs associated with hardware scaling as well as hiring engineers to handle the process. Top of the line hardware such as Nvidia’s A100 GPUs cost around $10,000 per unit. Most engineers specialized in data science and MLOps cost over $100,000 a year in terms of salary. Setting up and scaling your own hardware infrastructure is incredibly expensive. 

Renting hardware is much more affordable. AWS has on-demand prices ranging from $0.041/hr for cheaper instances to $32.77/hr for incredibly powerful instances. Very cost-effective instance types which allow you to deploy LLMs are around $0.76/hr on demand. However, while hardware costs are lower, you will still incur developer costs. This is why a tool like UbiOps is useful, we take care of AI management tasks for you. 

Hardware availability

For various political and economic reasons, enterprise-level GPUs have suffered a serious shortage. From 2020–23, a global silicon shortage due to trade disputes as well as logistical issues due to COVID hampered the ability for chip manufacturers to satisfy the high demand of the market. In 2023, the supply slowly returned to normal

However, with the AI boom, the demand for GPUs has increased tremendously. Nvidia, a company which has a stranglehold on the GPU market, holds approximately 80% of the market and is thriving during this demand boom. Its revenue increased from 5.9 billion in the third quarter of FY2023 to 18.1 billion in FY2024’s third quarter

The GPU shortage is something that you will need to deal with when deploying your AI workload on-premise. It’s a fickle market, political decisions and the decisions of Nvidia can affect the costs dramatically. Nvidia, for various reasons, tries to limit the stranglehold cloud giants such as AWS and Google cloud have on on-cloud GPUs. Nvidia partners with and gives GPU access to smaller companies. UbiOps is one of them. So, you can beat the GPU shortage by partnering with companies that are partnered with Nvidia.

Furthermore, innovative and multi-core hardware architectures such as high-performance computing (HPC) are still new and require a lot of expertise to set up. When it comes to integrating LLM tasks on HPC, the field is ripe for exploration and can potentially drastically improve speed and efficiency. A recent paper published February 3 2024, details the potential of integrating the field of LLMs and HPC. Stating that “HPC is critical in mitigating latency for real-time LLM applications”. However it also details the difficulties in integrating LLM tasks for HPC. It is easier to outsource these technical problems to an AI-serving platform.


Another challenge when it comes to model deployment is security. Meaning that you need to focus on protecting your company’s data as well as the data of your customers. Model deployment providers are specialized in providing computation, they are always concerned about potential security threats. If you want to deploy your AI on-premise, you will need to deal with this responsibility. 

We have seen major security vulnerabilities pop up out of nowhere, such as the Log4Shell exploit that appeared in November 2021. This Log4Shell exploit allowed for arbitrary code execution. For a company offering model deployment services, this could be disastrous. Many of the hackers using the Log4Shell exploit would use victim’s devices for their computational power. In essence, you will need to take into account these security concerns, which comes at the cost of time and money.

Benefits and drawbacks of a model serving platform

Remember, UbiOps can be used in-house and includes tools for making life easier and analyzing performance.

Benefit: Scalability

Model-serving platforms offer extensive scalability. Allowing you to scale up or down based on how much is needed. This can be done on very short timescales. UbiOps offers scale-to-zero and auto-scaling functionalities. 

What is scale to zero?

Scale to zero is an auto-scaling feature which enables compute providers to scale down your hardware usage completely when there is no traffic. For small-scale use cases, this feature is vital and can help you save on hardware costs substantially. 

Benefit: Cost

As mentioned in the first section, model-serving platforms allow you to bypass the very expensive process of hardware setup and maintenance. You don’t need an extensive hardware team and therefore can forgo those expensive labor costs

Benefit: Development time

Using a model-serving platform for your AI workloads will also reduce your development time drastically. Since model-serving platforms such as UbiOps offer an API, they are accessible anywhere and therefore you can deploy your models within minutes and have it available for production. You can focus your efforts purely on model development instead of infrastructure maintenance.

Benefit: Analytical and monitoring tools

Many model-serving applications provide extensive monitoring and logging capabilities. This is a useful way to measure and evaluate a model’s performance and efficiency. UbiOps has extensive monitoring capabilities, including logging, event auditing, and performance monitoring

Drawback: Privacy

While model-serving platforms have various ease-of-life features, one drawback is that you need to keep in mind data privacy and protection. For example, if you want to deploy models such as GPT-3 or GPT-4, you will need to use OpenAI’s model-deployment services. OpenAI has faced concerns about how they handle user data, as reported in a Reuters article from January 30th, 2024. Detailing how an Italian data protection authority known as Garante claimed that ChatGPT breached EU data protection rules

You can overcome this by checking whether companies adhere to security conventions and where they are certified. UbiOps is ISO 27001 and NEN 7510 certified. However, If you host an open-source model yourself you are ensured that the data stays on-premise.

Drawback: Customization

Another potential drawback is that model-serving platforms will offer less hardware and software flexibility. Allowing you to choose one of a select number of hardware clusters. If you decide to deploy models on-premise, you can customize and choose every aspect and integration of your deployment. 


To summarize, model and ai deployment comes with significant challenges, the main ones are scaling, GPU availability and security considerations. We then listed several benefits and drawbacks of using a model-deployment provider. Overall, both options come with benefits and drawbacks and there is no right or wrong option. Whether using a model-serving platform is right for you depends on your MLOps expertise, hardware knowledge and funding.

Latest news

Turn your AI & ML models into powerful services with UbiOps