AI models are often trained on massive datasets. The training procedure allows them to learn complex patterns, and gives them predictive power on new data points. Examples are Computer Vision models such as those used in object detection in images, and Large Language Models that are used to create chat-agents. Training Machine Learning models can be a time-intensive task. Heavier compute power can reduce the training time from days or weeks to hours or days. In search of heavy compute power, Team Epoch (TU Delft) and Bytesnet started discussion on a collaboration.
Team Epoch is a TU Delft Dream Team that competes in worldwide artificial intelligence competitions that further one or more of the UN Sustainable Development Goals.
By innovating in the field of AI, the students aim to have a positive impact on the world. They believe AI can make a difference in the world and would like to see AI as an accessible, understandable resource.
Bytesnet is a Dutch datacenter offering green HPC Infrastructure-as-a-service. Excess heat that is created during compute, is distributed back to the heat network.
Specifically, Epoch was looking for a set-up where they could use four NVIDIA A100 GPUs during a single training job. The team has local compute machines available, but for some of their competitions, heavier machines are preferred. Training jobs can last for days. Using larger machines can reduce this time drastically.
Successful deployment of the configuration
This set-up was not available yet at Bytesnet, so Bytesnet worked fast to set this up, leading to a successful deployment of this configuration. The compute resources were then made accessible by connecting them to the UbiOps platform. This allowed the heavy compute machine to be available on a job-to-job basis. Epoch tested the machines and verified that this suited their heavy-compute needs. Epoch: “Bytesnet’s GPUs decreased the training time for our models from days to hours. Through Ubiops’ platform, we can rapidly iterate on our models and parameters. Especially the new training functionality on the platform worked great for this.”
Dedicated training functionality
Simultaneously, UbiOps was working towards the release of their dedicated training functionality. This functionality makes it easier to manage different training experiments, and to iterate faster on training code. Epoch provided valuable input about their requirements of such a functionality. This feedback was taken into account in the release of the first iteration of the training functionality! The new training functionality allows users to quickly integrate their own code on UbiOps, to create experiments where training runs are monitored, and where the resulting performance of the models can be compared. Make sure to read this blogpost if you want to learn more about the training functionality.
After the trial period, Team Epoch, Bytesnet and UbiOps have agreed to continue the collaboration!
Team Epoch is a TU Delft Dream Team that competes in worldwide artificial intelligence competitions that further one or more of the UN Sustainable Development Goals. By innovating in the field of AI, the students aim to have a positive impact on the world. They believe AI can make a difference in the world and would like to see AI as an accessible, understandable resource.
If you want to know more about the team, you can check out their website www.teamepoch.net. Or follow Epoch on LinkedIn for updates around their competitions and accomplishments.
2. UbiOps: Easily manage, train and run your AI / ML jobs in one place.
UbiOps is an easy-to-use deployment and serving layer for your data science code. It turns your Python & R models and scripts into web services. Also, it allows you to use them from anywhere at any time. So you can embed them in your own applications, website or data infrastructure. Without having to worry about security, reliability or scalability. UbiOps takes care of this for you.