What you should know about the AI landscape
The generative AI landscape is evolving rapidly. New AI models, new platforms, new techniques for building AI applications, and new ideas are emerging every week. Any “best of” list will need to be updated regularly!
Having said that, I think an opinionated survey is useful, precisely because there’s so much going on.
I’m a co-founder of Daily, which provides real-time voice, video, and AI infra and tooling for developers. I’ve been helping our customers build and scale real-time AI features like voice-based AI customer service agents for the last 18 months. I’ve seen the evolution of the generative AI landscape first-hand as we’ve experimented and rolled out features at Daily, and second-hand, through the eyes of our customers coming up to speed on this new tech.
If you’re trying to figure out how to build AI features into your application, you’re not alone. The amazing new capabilities of generative AI models like GPT-4 are opening up new possibilities in almost every category of software. But it can be hard to know where to start.
AI companies refer to their core technologies as “models.” Today’s generative AI models are trained on a huge volume of data and are capable of creating novel content in response to input from a user. This is the “generative” in the term generative AI.
OpenAI’s launch of ChatGPT in late 2022 kicked off the generative AI era. ChatGPT is a consumer application that makes it easy for anyone to use OpenAI’s models. (For example, OpenAI’s current flagship model, GPT-4o).
This overview will cover some of today’s leading models, and platforms that provide access to models via developer APIs. We won’t talk about end-user applications like ChatGPT. If you’re reading this, we’re assuming that you are building those applications!
The reason it’s important to talk about both models and platforms is that most generative AI models are too large and computationally expensive to run directly on a laptop or phone. So building applications with generative AI, today, requires sending requests to big compute clusters running somewhere in the cloud. This will change over time. AI researchers are figuring out how to squeeze more capabilities into smaller models. And hardware manufacturers are ramping up the AI processing horsepower in consumer devices. But for the moment, AI and the cloud are almost synonymous.
A Cambrian explosion of models and use cases
There are several categories of generative AI models in widespread use today.
Large Language Models generate text output. OpenAI’s GPT-4o, Anthropic’s Claude series of models, and Meta’s Llama models are examples of LLMs.
Some LLMs also have vision capabilities. They can process images as input, in addition to text. Most AI researchers expect rapid progress in “multi-modal” LLM capabilities. OpenAI and Google have shown demos of their models processing audio input and generating audio output (in addition to text).
AI models that generate and modify images are starting to see widespread use in content creation and editing tools. Stable Diffusion, Midjourney, and Adobe’s Firefly are three of the leading image models. Video generation models are not far behind.
AI Voice generation tools are now so good that it’s often very hard to tell the difference between synthetically generated voices and real human speech. See below for my top three voice model picks.
Labs and platforms
AI companies can be labs, platforms, or both.
Labs create the models. They assemble huge data sets, maintain massive training clusters, and invent or optimize techniques for creating capable, reliable models from those two resources.
Platforms make the models available via developer APIs.
LLM Labs
The most prominent AI labs are OpenAI, Anthropic, Meta FAIR, and Google DeepMind.
All of these except Meta FAIR are also platforms, meaning that they provide developer access to their models via APIs. All of them are also primarily focused on LLMs (though all except Anthropic are also releasing significant audio and video AI research).
For a long time, OpenAI was the clear leader among LLM labs. GPT-4, released in March 2023, was so much better than every other model available that it was hard for most people to justify using anything else.
That changed this summer. Anthropic released Claude 3.5 Sonnet, which many people feel is the best overall LLM available today. (Though other people continue to prefer the GPT-4 series of models.) Google’s Gemini models are also now quite good, though most people who spend a lot of time evaluating models don’t think the Gemini series is as capable as GPT-4 and Claude.
Meta’s latest series of models – Llama 3.1 – are also excellent and are distinguished by being “open weights” models. Unlike the models from OpenAI, Anthropic, and Google, Meta’s models are freely downloadable and can be used, studied, and modified with few restrictions.
In summary, after almost two years of dominance by OpenAI, there is now a healthy competition for “the best model” and four labs have produced models that are roughly in the same class as OpenAI’s flagship, GPT-4.
Each of the labs also produces models in several sizes. Smaller models are less capable, but can be significantly faster and cheaper.
Lab | Notable models | Notes | Available via APIs hosted by |
---|---|---|---|
OpenAI | GPT-4o, GPT-4o mini | GPT-4o is still the gold standard LLM. Extremely capable, reasonably fast, relatively affordable. GPT-4o mini is a smaller, cheaper, somewhat less capable version. | OpenAI and Microsoft Azure |
Anthropic | Claude 3.5 Sonnet | Claude 3.5 Sonnet is an excellent model which performs extremely well across a wide range of use cases. Most people in the AI space believe that, with Sonnet, Anthropic caught up to OpenAI in flagship model capabilities. Sonnet’s biggest drawback is that it is slower than GPT-4o. | Anthropic, Amazon Web Services, Google Cloud Platform |
Meta FAIR | Llama 3.1 405B, 70B, 8B | Llama 3.1 405B is the only “open” model that rivals GPT-4. The two smaller models, 70B and 8B, are also very capable for their sizes. The smaller models are good choices for applications that do not need “the best” model capabilities. It’s possible to run the 8B model locally on your own computer! | A wide range of platforms, see next section. |
Google DeepMind | Gemini 1.5 Pro and Flash | Google’s two flagship models today are very good, though generally not considered quite as good as OpenAI’s and Anthropics models. They are distinguished, however, by supporting very large input contexts. This opens up new application possibilities. | Google Cloud Platform |
If you want to do a deep dive into the LLM lab space, it’s also worth checking out the interesting work from Mistral, Alibaba Cloud (the Qwen family of models), and DeepSeek Labs.
Voice Labs
Voice models are improving incredibly fast. Models are producing both better quality output and doing it faster. Speed is especially important in this space, because voice is so useful for conversational interactions such as customer support and games. In 2023 the fastest good voice models had a “time to first byte” of about a second – barely fast enough for an interactive conversation. Now the fastest models can deliver audio in 200ms.
The most prominent voice labs are ElevenLabs, Cartesia, and Deepgram.
Good voice technology is relatively expensive, but fierce competition is likely to drive cost down (and quality up) over time.
Here are subjective notes on the offerings from ElevenLabs, Cartesia, and Deepgram as of September 2024. All of these labs are also platforms – they make their models available to developers via APIs.
Lab | Cost | Latency | Notes |
---|---|---|---|
ElevenLabs | Highest | Highest (~400ms) | Excellent voice quality and a wide range of features. Relatively high cost and latency. Offers voice cloning (custom voices). |
Cartesia | Medium | Very good (~200ms) | Currently the best choice for many applications, occupying a sweet spot of low latency, high quality, and relatively low cost. Offers voice cloning (custom voices). |
Deepgram | Lowest | Excellent (~150ms) | Currently the best choice for cost-sensitive use cases. |
Image and Video
Image and video generation are the wild west! The big labs all have models that can produce images, but all of them restrict what those models are capable of in ways that can make the models quite hard to use. Concerns about copyright infringement, “deep fake” content production, and production of obscene and offensive content are all very real issues in this space.
A huge variety of open source image models and “fine tunes” of models are produced at breakneck pace by active communities of image model enthusiasts. The best of these models are very, very good for specific purposes. But the learning curve is steep. Image models are quite difficult to wrangle effectively and most people producing professional-quality work use many models in combination and have sophisticated and ever-changing workflows.
On the video side, it’s hard to pick out a clear set of leaders in general. Runway and Pika have both generated a lot of buzz.
In the specific space of AI models for video avatars, the two companies with the highest profiles and most impressive product offerings are HeyGen and Tavus.
If you are interested in learning about image and video models, the Civitai community is a good place to start.
Platforms
A great deal of effort is required to serve generative AI models reliably, at scale and low cost. A number of companies focus primarily on providing access to models via APIs rather than on developing the models.
These platform providers play a critical role, especially in making the Llama 3.1 models available (Meta FAIR does not offer API access for the Llama models), and in the highly diverse space of serving image models.
You can also turn to many of the platforms below if you need to rent a few GPUs to train or fine-tune a model, or if you need a small cluster of GPUs to run your own custom inference engine.
Here are notes on some of the platform AI companies that I use regularly.
Company | Known for |
---|---|
together.ai | Fast, high-quality Llama 3.1 in all three sizes. Full offering of compute clusters, custom infrastructure, and model fine-tuning. |
Fireworks AI | Wide variety of models, including both LLMs and image generation models. |
Fal | Very fast implementations of image models. |
Groq | Custom hardware that enables extremely high performance for the small Llama 3.1 models. |
Modal | Modal is my go-to when I want to write some Python code and have it magically run in the cloud. I’ve done model fine-tuning, hosted fine-tuned models, run big batch jobs, and built voice-to-voice AI experiments on Modal. Their opinionated, Python-focused approach and attention to detail in their developer tooling distinguishes Modal from everyone else in the serverless space. |
Cerebrium | Cerebrium is a serverless GPU platform built and run by a small team of experienced ML engineers. They are on this list because I’ve worked closely with and learned a huge amount from them. This is the team I turn to if I need to figure out how to build something new and it needs to run fast on a cluster of A100s and there are a bunch of unknown unknowns. Cerebrium is quietly powering some very large AI workloads. |
Microsoft Azure | The only platform other than OpenAI that serves OpenAI’s models. If you have a large amount of Azure infrastructure deployed or you are operating in a regulated environment that requires a cloud partner like Azure, you know already that you need to use Azure! If neither of those is the case, you will probably find Azure too difficult to work with compared to working directly with OpenAI or with one of the smaller platforms. |
AWS Bedrock | AWS serves models from Meta, Anthropic, and other labs under the Bedrock product name. If you have a large committed spend on AWS, then Bedrock is a good choice. But AWS is playing catch-up in this space and is neither the easiest nor most performant option for most startups. |
Change is the only constant
If you’re an active participant in AI tech development, and you’ve gotten this far, you probably have strong opinions about favorite providers that I’ve left out or details that I’ve glossed over.
Let me know what I missed or got wrong! I plan to update this overview regularly.
Right now, the only thing that’s constant in AI is the pace of change.