The Evolution of Infrastructure Markets — Foundation Models
AI infrastructure in the era of foundation models
Like many others, I’ve spent far too much time on ChatGPT, which is now the latest toy in the world of AI. Following Github Copilot and image generation tools like Dalle2 and Midjourney, this is the latest showcase in what large foundation models are capable of. New applications are still just being created, making it the right time to think about how AI infrastructure evolves as well.
In my last post, I discussed the evolution of cloud computing following the launch of AWS. To quickly summarize, the arc of that technology revolution resulted in a few waves of companies:
Abstractions — AWS, Google App Engine, Heroku, Cloudflare Workers, etc are all different types of abstractions for cloud primitives like compute, storage, etc. Each provided slightly different levels of abstractions and tradeoffs. Most of what followed were new ways to work with these abstractions or new products enabled by them.
Cloud Workflows — A wave of cloud workflow companies emerged in response to these new abstractions. Companies like Hashicorp and Datadog helped developers manage cloud deployments and monitor their cloud applications.
Cloud-native Everything — Simultaneously, companies like Snowflake, MongoDB, Netskope, and others ushered in adjacent infrastructure categories built for this new cloud-native world.
Consolidation — Today, we are in a consolidation phase of this cloud computing market. AWS, GCP, and Azure are rapidly expanding their product portfolio and leaning into M&A as their compute and storage businesses reach maturity.
I found these roughly sequential categories a helpful starting place to think about how AI infrastructure might evolve today around foundation models1.
I’ve written in the past about LLMs, but the broader theme of this post is around Foundation Models. While LLMs focus on language, Foundation Models can be trained on images, video, or anything else.
The cloud computing market started with AWS providing storage and compute primitives. With AI, the similar catalyst is the emergence of foundation models. The one that likely gets the most airtime is OpenAI’s GPT-3 family of models, but many of these models are produced from large tech powerhouses like Google, Meta, and Nvidia. Their models are often open-sourced and made publicly accessible. The model architecture is the “core innovation” here, but they still require significant resources to operationalize.
There are a number of these “abstraction” companies in AI. The most well-known of which is OpenAI, who provides its various models via an easy-to-use API. OpenAI builds the proprietary models, but doesn’t expose the specific model weights (ie secret sauce) to its customers. In essence, these companies are saying don’t worry about the the specifics of the model, just provide us a set of inputs and we will send you outputs. These outputs can be an answer to a question, a summary, a newly created image, or an embedding.
To extend the analogy from earlier, OpenAI is in many ways trying to be the AWS of AI infrastructure2. While AWS built their own physical servers, their product abstracted away the servers to provide simple compute and storage primitives for customers. Similarly, OpenAI has built their own models, but their product is an API that abstracts away the heavy lifting on the model side.
Because there are many, many potential use cases of these foundation models, we see companies build separate products for different use cases. For example, Cohere has one product for classification and a separate one for text generation. We also see startups carve out chunks of the language model problem space to focus on. For example, AssemblyAI is building models around audio intelligence and Adept AI is building a model for web-based task automation. Like most infrastructure categories, there is a tug-of-war between the horizontal approach (one stop shop) and vertical approach (best of breed).
Ultimately, the question here will be how much the abstraction layer leaks into the broader AI development toolkit. AWS and other cloud providers took the route of competing to be the one-stop shop for all things cloud, but many new companies were still able to cement their position in the infrastructure landscape.
Foundation Models are great for AI adoption because businesses do not have to build state of the art models from scratch. Instead, efforts are concentrated on a small number of models that can be used for a large number of use cases. This centralization of effort means that the average company can just focus on the last mile of AI development.
However, working with large foundation models will require a change in how we build AI products, just as working with cloud primitives changed how we build and deliver software. Today, it’s still hard to leverage foundation models across an organization and only companies like Meta have built internal tools to address these challenges:
A centralized service used across Meta comes with many challenges. Some of the challenges, such as client management, quotas, and cost attribution, considered solved problems for large-scale systems like databases, had to be adapted for the AI domain… MultiRay’s primary aim is to democratize access to large foundational models at Meta. It does so by centralizing the execution on accelerators like GPUs and using a cache to save on cost of recomputation as much as possible. Currently, MultiRay powers over 125 use cases across Meta, and it supports up to 20 million queries per second (QPS) while serving 800 billion queries per day.
For this new foundation model-centric approach to AI, new tools are starting to reimagine what the AI development workflow should now look like. As with most categories of infrastructure, the right tools will depend on the size and sophistication of the team using them. A few areas where we’re seeing new and existing players lean into:
Model Training — Foundation models need to be trained on large, diverse datasets so that they can develop a broad enough set of capabilities. Unsurprisingly, this is compute-intensive and thus, very expensive. It cost Open AI millions of dollars to train GPT-3 in part why they have raised $1B from Microsoft. In fact, most of the model layer companies have some compute partnership as shown below. Companies like MosaicML and Strong Compute are rewriting this narrative and optimizing model training such that it’s only a fraction of the cost. If these costs can be as low as MosaicML suggests, we may see more companies building models themselves.
Adaptation & Finetuning — While a select number of companies may build their own models, most companies will want to leverage pretrained models. Adaptation is the process of taking a pretrained model and adjusting the model’s parameters to be better suited to a specific task or dataset. There are a number of solutions out there ranging from libraries like Tensorflow and Pytorch to startups like Hugging Face and Snorkel. ML teams will spend less time on things like model selection and architecture and more time on adapting a generalized model for specific tasks.
Prompt Engineering — For large language models, the inputs provided into the model (“prompts”) have a significant impact on the outputs received. One technique is to show the model examples of how you’d like a task completed as shown below. Humanloop and Everyprompt are productizing this for GPT-3 while Dust is making proper prompt design more systematic. While there’s value in doing this in language models, it seems like it’s more of cryptic in generative images/video where describing an output is more complex. On the media side, search engines like Lexica or Openart are building a record of how prompts map to specific images.
Inference & Deployment — Once the AI models are trained, they then need to be deployed and used by end customers. This requires another layer of infrastructure to make inference both low latency and high throughput to power the most rich end user experiences. Tools like Nvidia’s Nemo Megatron (hell-of-a-name) or SliceX help accelerate inference. Other companies like Banana makes scaling up deployments on GPUs trivial.
Foundation models are rapidly becoming the most popular form of AI models as they provide a balance between power and ease-of-use. However, these models require different tools and infrastructure to properly use. This includes services for model training, adaptation, prompt engineering, and inference. Many of these businesses are still in very early stages while others are new products at existing incumbents. One of the key questions for me will be which, if any, of these categories are most radically in need of new products as opposed to existing tools.
In addition to a new foundation model-centric workflow, there are new adjacent problems that companies will need to address. I’m referring to issues that fall outside of ML engineers’ current day-to-day priorities, but are critical to the overall business. One of the more obvious areas of opportunity here falls under AI safety, compliance, and security.
Safety — With generative AI in particular, there are questions around how much to constrain what these models can output. For the time being, this question is somewhat ideological as the exact “rules” have not been defined. It’s easy to say, kids should probably not be exposed to models that output mature content, but there’s more gray area in other areas. In healthcare for example, AI may make information retrieval frictionless but without citations or traceability, these issues may not be as actionable.
Compliance — As foundation models have been trained on a wide range of publicly available data, companies will also need to address copyright and privacy concerns. It’s possible that these models are trained on personal data (accidentally) which may violate consumer privacy standards. Alternatively, the actual output of these models may plagiarize existing work, which has landed Github Copilot with a piracy lawsuit.
Security — Lastly, there are some concerns around centralization of models as it relates to security. As described by the Center for Research on Foundation Models, “a foundation model may become a single point of failure and thus a prime target for attacks against applications derived from this model”.
An open question here is how much of this responsibility falls towards the model providers vs applications on top or even end users. It’s likely that similar to cloud computing, there will be some shared responsibility model where both model providers and end customers will need to be accountable. It’s likely that the application layer is under most pressure to meet customer preferences and abide by regulations, but we’re already seeing OpenAI launch a content moderation tool so developers can put more controls in place.
Foundation Models are not just incremental improvements in AI task performance. Leveraging large, pretrained, generalized models is both democratizing AI and shifting how AI products are built. The ML tools of yesterday may not be best-suited for these new workflows and businesses will need to adapt from a technology and organizational perspective.
Despite sharing these potential areas of innovation within AI infrastructure, I want to re-emphasize that one of the main positive effects of foundation models is in simplifying AI development. Leveraging the generalized models that large institutions and organizations are creating should allow the average team to focus on a smaller subset of infrastructure problems. Rather than building multiple models for different use cases and datasets, companies will focus on using proprietary data to enhance models and deploy them in production.
foundation model = large AI model trained on a vast quantity of unlabeled data at scale resulting in a generalized model that can be adapted/fine-tuned to a wide range of downstream tasks
OpenAI and others like Cohere, Adept, A121, etc