No GPUs Available, What Now?

How to tackle the shortage of GPUs

Graphical Processing Units (GPUs) have become an integral part of today’s MLOps ecosystem, and with more companies running AI models in production than ever before, the demand for GPUs has also risen tremendously. In KPMG’s 2021 study, thriving in an AI world, business leaders from many industries noted AI is at least moderately functional in their organizations, including: Industrial manufacturing (93%), financial services (84%), tech (83%), retail (81%) and life sciences (77%).

However, the GPU supply has not been able to keep up with this increase in demand over the last years. Physical GPUs are almost constantly sold out and the big cloud providers don’t seem to be able to keep up with the demand either, leading to long waiting times for GPUs. But why is the demand for GPUs that high? And what can you do about it if you need one?

Why are GPUs so important?

To understand why GPUs are so important for operationalizing Machine Learning models, it’s good to have a look at what type of applications GPUs are made for. Owens et al. listed the following characteristics of GPU applications in his paper “GPU computing”:

  • Computation requirements are large
  • High parallelism
  • Throughput is more important than latency

In other words, GPUs are extremely good at executing tens of thousands of parallel threads to rapidly solve large problems having substantial inherent parallelism. Each individual computation might be a bit slower than it would be on a CPU, but because a GPU can handle so many in parallel, the overall problem is solved much faster.

Since most machine learning algorithms are basically a concatenation of a bunch of linear algebra operations, they are very suitable for parallelization. A typical neural network consists of long chains of interconnected layers (see figure below). While there is a massive amount of compute in a full network, it can be broken up into layers of smaller, sequentially dependent chunks of work. Because of this, neural networks can really leverage the power of GPUs to reach much higher speeds than possible with a CPU.

Image by Author, inspired by source: Choquette et al. (2021) “NVIDIA A100 Tensor Core GPU: Performance and Innovation”. This graphic outlines the Deep Learning network mapping to GPU.

Shi et al. actually performed a thorough benchmark between CPUs and GPUs for deep learning. Even though their paper is from 2017, their conclusions still hold. GPUs outperformed CPUs when it came to deep learning. You can see their results table for yourself in the figure below. The quickest time in each category is marked in bold.

Image by Shi et. al (2017) “Benchmarking State-of-the-Art Deep Learning Software Tools”. In bold you can find the quickest time within each category (Desktop CPU, Server CPU, Single GPU). GPU outperforms CPU.

The difficulty of obtaining a GPU when you need it

With GPUs being so perfect for ML workloads, and more and more companies using AI, demand has risen tremendously. This increase in demand makes it quite difficult to obtain a GPU nowadays. When you want to run your workloads on a GPU you have roughly two options:

  1. Buy a physical GPU
  2. Use a cloud GPU (for instance at AWS or Google, or via specific Machine Learning (ML) platforms like UbiOps)

Physical GPUs are very expensive and they’re also often sold out. This is caused by so-called scalpers that buy up all available GPUs with bots once they get offered on a website. Because of this, you can only buy GPUs from these scalpers, unless you’re faster than the bots. Scalpers resell the products at prices inflated by more than 200% compared to the original cost, making them ridiculously expensive.

With astronomical prices for physical GPUs, cloud GPUs become a more interesting option. You normally only have to pay for the time you use the GPU, so if your workloads don’t run too long it’s often a lot cheaper than buying a GPU yourself. Cloud GPUs are highly scalable, they minimize costs, and they also clear up local resources. You can just continue using your laptop while your model is running on a GPU in one of the clouds’ datacenters.

However, clouds have been having difficulty keeping up with the increased demands of GPUs. With increased demands, but too little increase in supply, users end up needing to wait for a GPU to become available. Gigaom AI analyst Anand Joshi looked into this issue and noted that a lot of users are experiencing longer wait times than a couple of years ago. If you have a Machine Learning (ML) model in production that needs to respond within seconds, it can become a big problem if you cannot even get an available GPU within a minute. In that case you might need to buy dedicated GPUs from the cloud that are available 24/7, but therefore also a lot pricier.

How to tackle the shortage of GPUs?

Let’s say you need a GPU for your model, but you cannot get your hands on a physical one, and you are experiencing too long wait times in queues in the cloud, what are your options?

Well, as I mentioned before, you could buy dedicated GPUs from any of the cloud providers (where you get a GPU on a per month basis as opposed to per hour basis), but this might be too costly for your organization. You could also try to poll for available GPUs in different clouds and pick the one that provides a GPU the fastest. But then you have to deal with multiple clouds and making your infrastructure cloud agnostic. It’s just too much of a hassle.

Another option is to go for a cloud environment that focuses on GPUs, like Escher Cloud (A European cloud backed by NVIDIA), or Paperspace Core. This might be overkill for your organization though, especially if you also just need regular CPUs or other features related to MLOps. However, if you focus on things like image or video processing, it might be a good fit!

The last option I want to highlight is going for a Machine Learning Operations (MLOps) platform with GPU support, like UbiOps.They have additional features for MLOps and through service level agreements (SLA’s) you can have some say in how fast a GPU should be available. UbiOps is an MLOps platform that runs in the cloud. It abstracts away most of the background IT infrastructure so you don’t need to worry about it. UbiOps also offers scalable GPU support and it can run on any cloud. They also have partnerships with Escher Cloud and other European clouds, which is handy if you are an EU based company working with sensitive data. I’m working at UbiOps of course so I’m a bit biased, so feel free to just test it out and see for yourself!

Conclusion

With the current demand and companies planning on incorporating more and more AI into their business, I think it’s safe to say that GPUs will remain scarce for a while. There are still ways to get your hands on a GPU though.

Whether you buy one of your own, use a cloud one, or use one through an MLOps platform, there are options out there! What option will fit best depends on your use case and how long you’re willing to wait for a GPU to become available.