Keeping up with GPU Demand: A New Wave of Cloud Providers
Hey everyone, just wanted to re-post one of the latest deep dives I did with the Unusual Ventures team!
The release of ChatGPT just over a year ago has led to widespread market interest in Foundation Models (FMs) and Large Language Models (LLMs). While we are still in the early days, enterprises are
rapidly exploring adoption of these models. Famously growth-stage tech startups such as Notion, Figma, and Zoom have already deeply integrated LLMs into a core part of their product offerings. At the same time, larger enterprises including Morgan Stanley, PwC, and Walmart are rolling out both internal and customer-facing solutions. All of these deployments have kicked off a race for the one thing powering it all: Graphics Processing Units (GPUs). As we head into 2024, we wanted to reflect on the state of GPU capacity, and how startups are looking to fill the gap.
Using GPUs in model buildout
The advent of foundation models has led to a mad dash to acquire as many GPUs as possible. Most of the current capacity is going towards the initial model creation step, known as training.
Model training is the process by which a machine learning model is fed a large dataset as part of the creation process. To help LLMs and FMs achieve "reasoning", they are generally trained with publicly available data sets like Common Crawl. These models are also trained on private datasets curated by the model provider themselves or datasets that are publicly available. This information is then "parameterized" so it is in a format that can be used by the model. Each parameter is given a weighting to encapsulate how much it should impact the results from the model.
This process is highly compute-intensive, and requires GPUs, which are catered towards running computation in a parallelized fashion. For example, it is rumored that OpenAI required 25k Nvidia A100 GPUs to train their 1.76T parameter GPT-4 model for over 100 days straight. Meta took roughly 1.7M GPU hours to train its 70B parameter Llama 2 model (equating to roughly 10K GPUs running over 7 weeks!). And just recently, Meta publicly announced they are bringing on an equivalent of 600k Nvidia H100s to train their upcoming Llama 3 model.
A training job requires a large amount of compute capacity. Hundreds of thousands of GPUs need to then be interconnected with a high-throughput network fabric to run together on a single training job. This networked buildout has made it difficult for large cloud providers to offer foundation model training as a service to their customers. It requires an almost complete redesign of their existing data centers.
To fill this market gap, we have seen new companies being founded, including Coreweave, Foundry, Lambda Labs, and Together AI.
The role of GPUs in model inference
After the training stage, a large foundation model still requires continuous compute capacity to run effectively, known as inference.
Training requires a large cluster of interconnected GPUs running over an extended period of time. Model inference, however, requires much less compute capacity with bursting workloads depending on when a model is being prompted. This means that the existing data centers owned by the large hyperscalers are more than capable of supporting their customers for inference.
Though we do expect cloud behemoths to compete in this category, new startups have emerged to provide model inference. The previous cloud wave was not a winner-take-all market. Therefore, we do not expect all inference compute to be run only by the hyperscalers. Instead, we have seen new entrants looking to differentiate via developer experience, product design, and lower costs.
This category includes players like Anyscale, Baseten, Banana.dev, Fermyon, Fly.io, Modal, and Runpod.
The rise of "models-as-a-service" companies
Some companies are happy being handed the underlying infrastructure required for deploying their models. Others want a solution with a much higher level of abstraction. This has led to the rise of "models-as-a service" companies. These companies are focused on providing a single-click solution to deploy a wide assortment of the most popular foundation models.
This category includes Hugging Face, which is primarily known as the go-to solution for model sharing. Hugging Face now also offers the ability to host their 60k+ models for the customer. Other providers include Anyscale, Replicate, Fireworks, and Lepton AI. These providers have a smaller amount of model offerings as they focus on the most popular open-source models, while looking to compete on performance/cost/warm-starts.
Questions about the future of GPU cloud startups
We are strong believers in the future use-cases of LLMs being built out today. By extension, we are investing heavily into the infrastructure powering this technology shift. Within the GPU cloud world, some questions that we are still thinking about include:
1. Where’s the money?
As companies race to adopt LLMs, how much incremental revenue will these businesses be able to drive through these product additions? The processes for either training an in-house LLM or consistently running LLM inference for a 3rd party model are currently very expensive. Companies may not be willing to pay GPU providers for long if they do not see an uplift in revenue or costs reduction.
2. Margin expansion?
Since GPU clouds can be considered re-sellers of Nvidia GPUs today, how will the margins of these companies look over time? These providers will continue to build out their software offerings to improve their margin structure and create differentiated solutions for their end customers.
3. What about the cloud providers?
Most GPU cloud customers have defaulted to large cloud providers, including Azure OpenAI Service and Amazon Bedrock. Given this situation, GPU cloud startups will be forced to innovate. They cannot compete with large cloud providers with greater GTM expertise and the ability to use pre-purchased cloud credits. Currently, Microsoft Azure is the only platform with access to OpenAI’s top models as well, so open-source models will also need to continuously improve