MiMo-V2-Pro and MiMo-V2-Omni are Xiaomi’s new agent foundation models. Pro is built for long-chain coding, tool use, and OpenClaw-style workflows, while Omni adds vision and audio to push the same agentic stack into the real world.
MiMo-V2-Flash is a 309B MoE (15B active) model by Xiaomi. It is a powerful, efficient, and ultra-fast foundation language model that particularly excels in reasoning, coding, and agentic scenarios, while also serving as an excellent general-purpose assistant for everyday tasks.
Xiaomi's MiMo-Audio is a breakthrough in open-source audio intelligence. Pre-trained on over 100M hours of data, it's the first audio model to show emergent few-shot generalization and In-Context Learning.