LLM API of choice: Cost vs. Context Window & Quality

Jon
0 replies
Hi all. Very curious how those building LLM-based apps are using the various LLM models/apis given the differences in cost, context window, quality. Personally, I'm always trying to find the right balance across these factors, creating the best possible app at lowest possible cost. I'll provide a view of how we do this in our app. Would love to know how others are tackling this! --OUR AI USE CASE-- We process large amounts of text from various media sources, and consolidate into more consumable summaries using LLM Chat/Completion. --OUR AI SOLUTION-- We currently use 2 models: -GPT (4o): used for large context windows (128k tokens) where high quality output is required, but executions are costly. -Mixtral (8x-22b): has smaller context window (64k tokens) and I think lower quality than gpt-4o, but much less expensive to run. We created an llm_factory that provides a model based on the following factors: - User's Package (premium=gpt, basic=mixtral) - Context Window (even if the user has a 'basic' package, we let them use the GPT model if their context window exceeds mixtral's 64k, as we want to provide the best quality and avoid context window failures) -Use Case (certain lower-value use cases will always use mixtral)
🤔
No comments yet be the first to help