LLM Reference

Using Xiaomi MiMo-V2-Flash on Novita AI

Implementation guide · MiMo V2 · Xiaomi

Serverless

Quick Start

  1. 1
    Create an account at Novita AI and generate an API key.
  2. 2
    Use the Novita AI SDK or REST API to call MiMo-V2-Flash.
  3. 3
    You'll be billed $0.10/1M input, $0.30/1M output tokens. See full pricing.

Code Examples

Code examples for this provider have not been sourced yet.

About Novita AI

Novita AI offers a GPU-based inference API for image, video, and language model generation with a broad catalog of open-source models.

Pricing on Novita AI

TypePrice (per 1M)
Input tokens$0.10
Output tokens$0.30

Capabilities

ReasoningFunction Calling

About Xiaomi MiMo-V2-Flash

MiMo-V2-Flash is Xiaomi's efficient open-source Mixture-of-Experts model, announced December 17, 2025 at Xiaomi's Human-Car-Home Ecosystem Partner Conference. It has 309B total parameters with 15B active, uses hybrid attention that interleaves Sliding Window Attention and Global Attention, and extends native 32K context to 256K. Multi-Token Prediction enables about 2.6x speculative decoding speedup. The model was distributed with weights on Hugging Face and ranked highly on SWE-Bench Verified and multilingual benchmarks at research time.

Model Specs

Released2025-12-17
Parameters309B
Context262K
Architecturemoe
Knowledge cutoff2024-12

Provider

Novita AI