
AgentX - AI Workforce is a multi-agent system that scales your operations by organizing AI agents into collaborative, hierarchical teams. Automate complex tasks, streamline workflows, and unlock new levels of productivity with intelligent agent coordination. Now come with evaluation framework. Ship your AI agent with confidence.
This is the 3rd launch from AgentX - Multi-agent and eval framework. View more

AgentX
Launching today
Evaluate AI agents before they fail. Create test suites, run evaluations, and pinpoint issues before they reach production.
AgentX provides full observability and traceability for your AI agents. AI analysis not only identifies problems but also suggests fixes-like an AI doctor for your agents.
Simulate run your agents across multiple LLM providers to compare performance, cost, and latency, helping you make better decisions about which LLM to go.
Run eval before deploy. Like CI/CD for AI agents.






Free Options
Launch Team / Built With



AgentX - Multi-agent and eval framework
GNGM
I like the "CI/CD for AI agents" framing.
What does a failed deployment look like in AgentX? Can teams set quality thresholds that block releases?
AgentX - Multi-agent and eval framework
@polman_trudo Exactly. Teams can define evaluation criteria and quality thresholds. If a change causes performance regressions, the evaluation can fail before deployment, similar to how software teams use automated tests to prevent bad releases.
HeyForm
Congrats on the launch, Robin!
Are you generating synthetic test cases, or relying on real production traces? One challenge we've seen is that synthetic evals often miss the edge cases users actually trigger.
AgentX - Multi-agent and eval framework
@itsluo We agree. Production data is usually the best source of truth. Our focus is helping teams build evals from real traces and failure cases, while also supporting synthetic generation when coverage gaps exist. The best results come from combining both approaches.
Triforce Todos
Running the same agent across multiple LLM providers to compare cost/latency is such an underrated feature.
How many providers do you support right now?
AgentX - Multi-agent and eval framework
@abod_rehman Thank you Abdul. We currently support all major LLM vendors out of box (Claude, GPT, Gemini, Llama, Grok). You can also use custom LLM to provide your own base url that point to any other LLM that is not listed here.
How much setup is needed to wire this into an existing agent stack?
Is it a quick SDK drop-in or more of a real integration project?
AgentX - Multi-agent and eval framework
@amna9
It's designed to be a lightweight integration. If you already have an agent, you can connect it using our official AgentX Python SDK and start evaluating it without major changes to your existing stack. For most teams, it's more of an SDK drop-in than a large integration project.
The official SDK is available via AgentX Python SDK GitHub.
Congrats on the launch! 🎉
for multi-agent setups specifically, does it trace failures at the individual agent level or do you just see the whole chain break down as one blob?
AgentX - Multi-agent and eval framework
@boyuan_deng1 Thank you!
We do both! We look at each individual agent detail process to determine if there is any issue. And at same time we look at the overall workflow and output.
We use a mix of OpenAI, Anthropic, and Gemini models across our agent stack. Can AgentX help us decide which model works best?
AgentX - Multi-agent and eval framework
@pawel_dmochewicz Yes, Paweł, tanks for this question! That's one of the core use cases. AgentX lets you simulate and evaluate your agents across multiple LLM providers, then compare performance, cost, and latency side by side. Instead of guessing which model to use, you can make data-driven decisions before deploying changes to production.