One of the big reasons for the increased usage of AI on the web is the availability of open source foundation models.
Increasingly, Artificial Intelligence (AI) lies at the heart of online tools and applications. For example, the global chatbot market is expected to reach $1 billion dollars by 2024, because they can save companies time and money. According to Juniper Research, consumers and businesses will save around 2.5 billion hours of work annually in 2023 because of chatbots alone. And then we haven’t even scratched the surface of the vast field of AI. Machine learning (ML) assistance can be found in many places on the internet, like in:
- Customer service automation
- Image and video recognition
- Content creation and curation
- Search engines
- Personalization of content
One of the big reasons for the increased usage of AI on the web is the availability of open source foundation models. These are base ML models that can be used to quickly develop a specialized ML model for your specific task, for which the code is freely available. The availability of these models means that millions of people across the world can incorporate state of the art AI into their products. Most likely, foundation models have already played a role at your job or in your life. ChatGPT is built on top of the foundation model GPT-3.5 and GPT-4!
What are foundation models?
Foundation models are large-scale machine learning models that are pre-trained on vast amounts of data. They are designed to provide a starting point for developers to fine-tune and customize models for specific tasks. These models are trained using unsupervised learning on massive datasets, such as Wikipedia, and can be fine-tuned for a wide range of applications. Often, foundation models are open source, allowing anybody to tinker with them.
Foundation models can be seen as the next step in the progression of computer learning. We went from rule-based data processing, to machine learning, to deep learning, and now to foundation models. In each next step, the computer learning models are increasingly more generalized.
So what is the main selling point of the foundation model? They are trained on gigantic datasets to gain a deep understanding of, for example, the English language. This understanding can then be used to perform specific tasks, like sentiment analysis, question answering, summarization, or text prediction to name a few. Having this understanding means that the majority of the model training is already done for you. All you need to do is create a labeled dataset for your specific task and train the foundation model on that dataset so it “knows” what you need it to do. The first three blocks in the diagram below are nothing for you to worry about!*Foundation models are trained on massive datasets to create general understanding. This understanding can then be used to create specialized models built on top of the foundation model.
Benefits of foundation models
Convenience is not the only advantage of open source foundation models. There are many reasons to want to use them! Here, we highlight three of the most important ones:
- Accelerated development
- Improved performance
- Access to collective intelligence
Training a machine learning model from scratch requires collecting and preprocessing large amounts of data, which often takes an expert many hours to do. This is expensive and slow! It is much faster to only have to create a small dataset for fine tuning.
Foundation models are usually trained on terabytes of data. Even if you were to take the from-scratch approach for your AI application, it is not feasible to amass that much data just to train your personal ML model. This means that fine-tuned foundation models will always have a leg up on the alternatives. They simply had much more data available to them! This approach often leads to improved performance, as the model has already learned relevant features and representations during pre-training. Fine-tuning allows the model to adapt and specialize for the specific task, leading to better accuracy and faster convergence.
“Having this understanding means that the majority of the model training is already done for you.”
Developers of open source software actively participate in their communities, contribute improvements, and share their own fine-tuned models. This creates collaborative environments where progress is rapid and knowledge is shared. In a leaked memo, a researcher at Google explains why open source will outperform big tech companies in the development of new AI technologies: “The innovations that powered open source’s recent successes directly solve problems we’re still struggling with.”
Foundation models also make it possible for enterprises to set up systems that interpret large amounts of data like customer reviews and survey responses. In addition, they can create chatbots that are able to help clients with simple questions. This can save them time and money.
Types of foundation models
Putting it broadly, foundation models can be categorized into two groups:
Within these groups there are many different foundation models that have their own characteristics. If you have an understanding of the differences between those models you can make decisions about which one suits your application best.
On the NLP side of things, popular models are:
- BERT (and derivations of BERT like roBERTa and DistilBERT)
- GPT-3 and GPT-4
BERT (Bidirectional Encoder Representations from Transformers) is a masked language model. This means that it was trained by feeding it text and removing words from the text, and forcing it to use context on either side of the removed word to predict what the removed word should be! This means BERT is good at context-related tasks like sentiment analysis. GPT, on the other hand, is an autoregressive model. It only uses the past context when making predictions. But, GPT was trained on much more data than BERT, over 10 times more! So it has access to more information. Finally, BLOOM is a model that was developed with responsible ethics in mind. Read more about that here!
In CV, we have models like Stable Diffusion and DALL-E, and Florence. Stable Diffusion and DALL E are text-to-image models that can create images based on instructions you give it in text. Florence, on the other hand, can be used for many different visual tasks like object classification, image captioning, video retrieval, etc.*An image generated by Stable Diffusion of “A beautiful flower field on a grassy hill”
Adaptations to – or combinations of – these models, or completely different ones, can be found in places like HuggingFace. This is an AI community dedicated to making AI available to the general public.
Running a foundation model in UbiOps
Foundation models are really big and take a lot of computing power to run, let alone fine tune. It is possible to do this on UbiOps, where you can benefit from our powerful GPUs. By uploading your model onto our platform you create a microservice with its own API endpoint.
This means you can incorporate foundation models, and any other model you would like, into your operations without having to invest in the digital infrastructure to run them.
Are you interested in how that works? Check out our documentation or book a demo with us!
The future of foundation models
As the field of AI continues to evolve, foundation models are likely to become even more powerful and versatile. Firstly, the increasing availability of large datasets and advances in computing power mean it will become possible to train even larger and more complex models. This will allow developers to create AI applications that are more sophisticated and effective than the current state of the art. Secondly, the open source community will continue to play an important role in the development of foundation models, making sure that they remain accessible and relevant to developers around the world. A researcher at Google claims open source AI will outcompete Google and OpenAI. Finally, there will likely be increased focus on developing more energy-efficient models.