Using Kimi K2.6 on NVIDIA NIM

Implementation guide · Kimi K2 · Moonshot AI

ServerlessOpen Source

Quick Start

1
Create an account at NVIDIA NIM and generate an API key.
2
Use the NVIDIA NIM SDK or REST API to call moonshotai/kimi-k2.6 — see the documentation for request format.
3
You'll be billed . See full pricing.

API Portal Documentation Pricing Model Card

Code Examples

See NVIDIA NIM documentation for integration details.

About NVIDIA NIM

NIM packages inference runtimes and model profiles into containers that expose standard API surfaces such as chat completions, completions, model listing, tokenization, health, and management endpoints. The hosted API path is useful for prototyping and catalog discovery, while the NGC/container path is the self-hosted route for teams that want GPU-hour infrastructure control, private-network deployment, Kubernetes scaling, or NVIDIA AI Enterprise support. Per-token pricing is not a universal provider-level claim in the current seed data; pricing should stay attached to sourced model-provider rows or NVIDIA's current catalog terms.

NVIDIA NIM is NVIDIA's deployment platform for GPU-accelerated inference microservices. Developers can try hosted NIM APIs through the NVIDIA API Catalog on build.nvidia.com, then move the same model families into self-hosted NIM containers on NVIDIA GPUs in a data center, private cloud, public cloud, or workstation. The catalog positions NIM around optimized open and NVIDIA models, including chat, coding, reasoning, retrieval, vision, speech, and safety use cases, with downloadable model cards and API endpoints where NVIDIA exposes them.

View all models on NVIDIA NIM →

Pricing on NVIDIA NIM

Type	Price (per 1M)
Image input	$1.00
Video input	$1.00

Capabilities

VisionMultimodalReasoningFunction CallingTool UseStructured OutputsPrompt Caching

About Kimi K2.6

Kimi K2.6 is Moonshot AI's multimodal agentic coding model, released April 20 2026 under a Modified MIT license. Built on a 1-trillion-parameter MoE architecture (32B active, 384 experts with 8 selected per token plus 1 shared expert, 61 layers), it features a 262K context window and up to 65,536 output tokens. Supports native image and video inputs (screenshots, PDFs, spreadsheets). Designed for long-horizon coding with agent swarms of up to 300 sub-agents and 4,000 coordinated steps; Moonshot AI cites 200–300 sequential tool calls without task drift. Key benchmarks: SWE-bench Verified 80.2%, SWE-bench Pro 58.6%, LiveCodeBench v6 89.6%, GPQA Diamond 90.5%, Terminal-Bench 2.0 66.7%. Chatbot Arena Elo 1454 (2026-04-28 snapshot).

Full model details →

Model Specs

Released2026-04-20

Parameters1T

Context262k

ArchitectureMixture of Experts

Knowledge cutoff2025-04

NVIDIA

Santa Clara, California, United States